thr3ads.net - Gluster users - [Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages [May 2020]

If this information is useful, please help other people find it:
Share via:

Artem Russakovskii

2020-May-02 17:18 UTC

[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages

I don't have geo replication.

Still waiting for someone from the gluster team to chime in. They used to
be a lot more responsive here. Do you know if there is a holiday perhaps,
or have the working hours been cut due to Coronavirus currently?

I'm not inclined to try a v6 upgrade without their word first.

On Sat, May 2, 2020, 12:47 AM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:
> On May 1, 2020 8:03:50 PM GMT+03:00, Artem Russakovskii <
> archon810 at gmail.com> wrote:
> >The good news is the downgrade seems to have worked and was painless.
> >
> >zypper install --oldpackage glusterfs-5.13, restart gluster, and almost
> >immediately there are no heal pending entries anymore.
> >
> >The only things still showing up in the logs, besides some healing is
> >0-glusterfs-fuse:
> >writing to fuse device failed: No such file or directory:
> >==> mnt-androidpolice_data3.log <=> >[2020-05-01
16:54:21.085643] E
> >[fuse-bridge.c:219:check_and_dump_fuse_W]
> >(-->
> >/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
> >(-->
> >/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
> >(-->
> >/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
> >/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] ))))) 0-glusterfs-fuse:
> >writing to fuse device failed: No such file or directory
> >==> mnt-apkmirror_data1.log <=> >[2020-05-01
16:54:21.268842] E
> >[fuse-bridge.c:219:check_and_dump_fuse_W]
> >(-->
> >/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fdf2b0a624d]
> >(-->
> >/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fdf2748949a]
> >(-->
> >/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fdf274897bb]
> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fdf2a5f64f9] (-->
> >/lib64/libc.so.6(clone+0x3f)[0x7fdf2a32ef2f] ))))) 0-glusterfs-fuse:
> >writing to fuse device failed: No such file or directory
> >
> >It'd be very helpful if it had more info about what failed to write
and
> >why.
> >
> >I'd still really love to see the analysis of this failed upgrade
from
> >core
> >gluster maintainers to see what needs fixing and how we can upgrade in
> >the
> >future.
> >
> >Thanks.
> >
> >Sincerely,
> >Artem
> >
> >--
> >Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
> ><http://www.apkmirror.com/>, Illogical Robot LLC
> >beerpla.net | @ArtemR <http://twitter.com/ArtemR>
> >
> >
> >On Fri, May 1, 2020 at 7:25 AM Artem Russakovskii <archon810 at
gmail.com>
> >wrote:
> >
> >> I do not have snapshots, no. I have a general file based backup,
but
> >also
> >> the other 3 nodes are up.
> >>
> >> OpenSUSE 15.1.
> >>
> >> If I try to downgrade and it doesn't work, what's the
brick
> >replacement
> >> scenario - is this still accurate?
> >>
> >
>
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
> >>
> >> Any feedback about the issues themselves yet please? Specifically,
is
> >> there a chance this is happening because of the mismatched gluster
> >> versions? Though, what's the solution then?
> >>
> >> On Fri, May 1, 2020, 1:07 AM Strahil Nikolov <hunter86_bg at
yahoo.com>
> >> wrote:
> >>
> >>> On May 1, 2020 1:25:17 AM GMT+03:00, Artem Russakovskii <
> >>> archon810 at gmail.com> wrote:
> >>> >If more time is needed to analyze this, is this an option?
Shut
> >down
> >>> >7.5,
> >>> >downgrade it back to 5.13 and restart, or would this screw
> >something up
> >>> >badly? I didn't up the op-version yet.
> >>> >
> >>> >Thanks.
> >>> >
> >>> >Sincerely,
> >>> >Artem
> >>> >
> >>> >--
> >>> >Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC
> >>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR>
> >>> >
> >>> >
> >>> >On Thu, Apr 30, 2020 at 3:13 PM Artem Russakovskii
> >>> ><archon810 at gmail.com>
> >>> >wrote:
> >>> >
> >>> >> The number of heal pending on citadel, the one that
was upgraded
> >to
> >>> >7.5,
> >>> >> has now gone to 10s of thousands and continues to go
up.
> >>> >>
> >>> >> Sincerely,
> >>> >> Artem
> >>> >>
> >>> >> --
> >>> >> Founder, Android Police
<http://www.androidpolice.com>, APK
> >Mirror
> >>> >> <http://www.apkmirror.com/>, Illogical Robot
LLC
> >>> >> beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
> >>> >>
> >>> >>
> >>> >> On Thu, Apr 30, 2020 at 2:57 PM Artem Russakovskii
> >>> ><archon810 at gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >>> Hi all,
> >>> >>>
> >>> >>> Today, I decided to upgrade one of the four
servers (citadel) we
> >>> >have to
> >>> >>> 7.5 from 5.13. There are 2 volumes, 1x4
replicate, and fuse
> >mounts
> >>> >(I sent
> >>> >>> the full details earlier in another message). If
everything
> >looked
> >>> >OK, I
> >>> >>> would have proceeded the rolling upgrade for all
of them,
> >following
> >>> >the
> >>> >>> full heal.
> >>> >>>
> >>> >>> However, as soon as I upgraded and restarted, the
logs filled
> >with
> >>> >>> messages like these:
> >>> >>>
> >>> >>> [2020-04-30 21:39:21.316149] E
> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
> >>> >>> (1298437:400:17) failed to complete successfully
> >>> >>> [2020-04-30 21:39:21.382891] E
> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
> >>> >>> (1298437:400:17) failed to complete successfully
> >>> >>> [2020-04-30 21:39:21.442440] E
> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
> >>> >>> (1298437:400:17) failed to complete successfully
> >>> >>> [2020-04-30 21:39:21.445587] E
> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
> >>> >>> (1298437:400:17) failed to complete successfully
> >>> >>> [2020-04-30 21:39:21.571398] E
> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
> >>> >>> (1298437:400:17) failed to complete successfully
> >>> >>> [2020-04-30 21:39:21.668192] E
> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
> >>> >>> (1298437:400:17) failed to complete successfully
> >>> >>>
> >>> >>>
> >>> >>> The message "I [MSGID: 108031]
> >>> >>> [afr-common.c:2581:afr_local_discovery_cbk]
> >>> >>> 0-androidpolice_data3-replicate-0: selecting
local read_child
> >>> >>> androidpolice_data3-client-3" repeated 10
times between
> >[2020-04-30
> >>> >>> 21:46:41.854675] and [2020-04-30 21:48:20.206323]
> >>> >>> The message "W [MSGID: 114031]
> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
> >>> >>> 0-androidpolice_data3-client-1: remote operation
failed
> >[Transport
> >>> >endpoint
> >>> >>> is not connected]" repeated 264 times
between [2020-04-30
> >>> >21:46:32.129567]
> >>> >>> and [2020-04-30 21:48:29.905008]
> >>> >>> The message "W [MSGID: 114031]
> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
> >>> >>> 0-androidpolice_data3-client-0: remote operation
failed
> >[Transport
> >>> >endpoint
> >>> >>> is not connected]" repeated 264 times
between [2020-04-30
> >>> >21:46:32.129602]
> >>> >>> and [2020-04-30 21:48:29.905040]
> >>> >>> The message "W [MSGID: 114031]
> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
> >>> >>> 0-androidpolice_data3-client-2: remote operation
failed
> >[Transport
> >>> >endpoint
> >>> >>> is not connected]" repeated 264 times
between [2020-04-30
> >>> >21:46:32.129512]
> >>> >>> and [2020-04-30 21:48:29.905047]
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> Once in a while, I'm seeing this:
> >>> >>> ==>
bricks/mnt-hive_block4-androidpolice_data3.log <=> >>>
>>> [2020-04-30 21:45:54.251637] I [MSGID: 115072]
> >>> >>> [server-rpc-fops_v2.c:1681:server4_setattr_cbk]
> >>> >>> 0-androidpolice_data3-server: 5725811: SETATTR /
> >>> >>>
> >>> >
> >>>
> >
>
androidpolice.com/public/wp-content/uploads/2019/03/cielo-breez-plus-hero.png
> >>> >>> (d4556eb4-f15b-412c-a42a-32b4438af557), client:
> >>> >>>
> >>>
> >>>
>
>
>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-2-RECON_NO:-1,
> >>> >>> error-xlator: androidpolice_data3-access-control
[Operation not
> >>> >permitted]
> >>> >>> [2020-04-30 21:49:10.439701] I [MSGID: 115072]
> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk]
> >>> >>> 0-androidpolice_data3-server: 201833: SETATTR /
> >>> >>> androidpolice.com/public/wp-content/uploads
> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
> >>> >>>
> >>>
> >>>
>
>
>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
> >>> >>> error-xlator: androidpolice_data3-access-control
[Operation not
> >>> >permitted]
> >>> >>> [2020-04-30 21:49:10.453724] I [MSGID: 115072]
> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk]
> >>> >>> 0-androidpolice_data3-server: 201842: SETATTR /
> >>> >>> androidpolice.com/public/wp-content/uploads
> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
> >>> >>>
> >>>
> >>>
>
>
>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
> >>> >>> error-xlator: androidpolice_data3-access-control
[Operation not
> >>> >permitted]
> >>> >>> [2020-04-30 21:49:16.224662] I [MSGID: 115072]
> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk]
> >>> >>> 0-androidpolice_data3-server: 202865: SETATTR /
> >>> >>> androidpolice.com/public/wp-content/uploads
> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
> >>> >>>
> >>>
> >>>
>
>
>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
> >>> >>> error-xlator: androidpolice_data3-access-control
[Operation not
> >>> >permitted]
> >>> >>>
> >>> >>> There's also lots of self-healing happening
that I didn't expect
> >at
> >>> >all,
> >>> >>> since the upgrade only took ~10-15s.
> >>> >>> [2020-04-30 21:47:38.714448] I [MSGID: 108026]
> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> >>> >>> 0-apkmirror_data1-replicate-0: performing
metadata selfheal on
> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461
> >>> >>> [2020-04-30 21:47:38.765033] I [MSGID: 108026]
> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]
> >>> >>> 0-apkmirror_data1-replicate-0: Completed metadata
selfheal on
> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461. sources=[3]
sinks=0 1 2
> >>> >>> [2020-04-30 21:47:38.765289] I [MSGID: 108026]
> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> >>> >>> 0-apkmirror_data1-replicate-0: performing
metadata selfheal on
> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296
> >>> >>> [2020-04-30 21:47:38.800987] I [MSGID: 108026]
> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]
> >>> >>> 0-apkmirror_data1-replicate-0: Completed metadata
selfheal on
> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296. sources=[3]
sinks=0 1 2
> >>> >>>
> >>> >>>
> >>> >>> I'm also seeing "remote operation
failed" and "writing to fuse
> >>> >device
> >>> >>> failed: No such file or directory" messages
> >>> >>> [2020-04-30 21:46:34.891957] I [MSGID: 108026]
> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]
> >>> >>> 0-androidpolice_data3-replicate-0: Completed
metadata selfheal
> >on
> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1
[2]  sinks=3
> >>> >>> [2020-04-30 21:45:36.127412] W [MSGID: 114031]
> >>> >>> [client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
> >>> >>> 0-androidpolice_data3-client-0: remote operation
failed
> >[Operation
> >>> >not
> >>> >>> permitted]
> >>> >>> [2020-04-30 21:45:36.345924] W [MSGID: 114031]
> >>> >>> [client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
> >>> >>> 0-androidpolice_data3-client-1: remote operation
failed
> >[Operation
> >>> >not
> >>> >>> permitted]
> >>> >>> [2020-04-30 21:46:35.291853] I [MSGID: 108031]
> >>> >>> [afr-common.c:2543:afr_local_discovery_cbk]
> >>> >>> 0-androidpolice_data3-replicate-0: selecting
local read_child
> >>> >>> androidpolice_data3-client-2
> >>> >>> [2020-04-30 21:46:35.977342] I [MSGID: 108026]
> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> >>> >>> 0-androidpolice_data3-replicate-0: performing
metadata selfheal
> >on
> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591
> >>> >>> [2020-04-30 21:46:36.006607] I [MSGID: 108026]
> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]
> >>> >>> 0-androidpolice_data3-replicate-0: Completed
metadata selfheal
> >on
> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1
[2]  sinks=3
> >>> >>> [2020-04-30 21:46:37.245599] E
> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
> >>> >>> (-->
> >>>
>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
> >>> >>> (-->
> >>> >>>
> >>>
>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
> >>> >>> (-->
> >>> >>>
> >>>
>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
> >>> >>> (-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f]
)))))
> >0-glusterfs-fuse:
> >>> >>> writing to fuse device failed: No such file or
directory
> >>> >>> [2020-04-30 21:46:50.864797] E
> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
> >>> >>> (-->
> >>>
>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
> >>> >>> (-->
> >>> >>>
> >>>
>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
> >>> >>> (-->
> >>> >>>
> >>>
>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
> >>> >>> (-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f]
)))))
> >0-glusterfs-fuse:
> >>> >>> writing to fuse device failed: No such file or
directory
> >>> >>>
> >>> >>> The number of items being healed is going up and
down wildly,
> >from 0
> >>> >to
> >>> >>> 8000+ and sometimes taking a really long time to
return a value.
> >I'm
> >>> >really
> >>> >>> worried as this is a production system, and I
didn't observe
> >this in
> >>> >our
> >>> >>> test system.
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> gluster v heal apkmirror_data1 info summary
> >>> >>> Brick nexus2:/mnt/nexus2_block1/apkmirror_data1
> >>> >>> Status: Connected
> >>> >>> Total Number of entries: 27
> >>> >>> Number of entries in heal pending: 27
> >>> >>> Number of entries in split-brain: 0
> >>> >>> Number of entries possibly healing: 0
> >>> >>>
> >>> >>> Brick forge:/mnt/forge_block1/apkmirror_data1
> >>> >>> Status: Connected
> >>> >>> Total Number of entries: 27
> >>> >>> Number of entries in heal pending: 27
> >>> >>> Number of entries in split-brain: 0
> >>> >>> Number of entries possibly healing: 0
> >>> >>>
> >>> >>> Brick hive:/mnt/hive_block1/apkmirror_data1
> >>> >>> Status: Connected
> >>> >>> Total Number of entries: 27
> >>> >>> Number of entries in heal pending: 27
> >>> >>> Number of entries in split-brain: 0
> >>> >>> Number of entries possibly healing: 0
> >>> >>>
> >>> >>> Brick citadel:/mnt/citadel_block1/apkmirror_data1
> >>> >>> Status: Connected
> >>> >>> Total Number of entries: 8540
> >>> >>> Number of entries in heal pending: 8540
> >>> >>> Number of entries in split-brain: 0
> >>> >>> Number of entries possibly healing: 0
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> gluster v heal androidpolice_data3 info summary
> >>> >>> Brick
nexus2:/mnt/nexus2_block4/androidpolice_data3
> >>> >>> Status: Connected
> >>> >>> Total Number of entries: 1
> >>> >>> Number of entries in heal pending: 1
> >>> >>> Number of entries in split-brain: 0
> >>> >>> Number of entries possibly healing: 0
> >>> >>>
> >>> >>> Brick forge:/mnt/forge_block4/androidpolice_data3
> >>> >>> Status: Connected
> >>> >>> Total Number of entries: 1
> >>> >>> Number of entries in heal pending: 1
> >>> >>> Number of entries in split-brain: 0
> >>> >>> Number of entries possibly healing: 0
> >>> >>>
> >>> >>> Brick hive:/mnt/hive_block4/androidpolice_data3
> >>> >>> Status: Connected
> >>> >>> Total Number of entries: 1
> >>> >>> Number of entries in heal pending: 1
> >>> >>> Number of entries in split-brain: 0
> >>> >>> Number of entries possibly healing: 0
> >>> >>>
> >>> >>> Brick
citadel:/mnt/citadel_block4/androidpolice_data3
> >>> >>> Status: Connected
> >>> >>> Total Number of entries: 1149
> >>> >>> Number of entries in heal pending: 1149
> >>> >>> Number of entries in split-brain: 0
> >>> >>> Number of entries possibly healing: 0
> >>> >>>
> >>> >>>
> >>> >>> What should I do at this point? The files I
tested seem to be
> >>> >replicating
> >>> >>> correctly, but I don't know if it's the
case for all of them,
> >and
> >>> >the heals
> >>> >>> going up and down, and all these log messages are
making me very
> >>> >nervous.
> >>> >>>
> >>> >>> Thank you.
> >>> >>>
> >>> >>> Sincerely,
> >>> >>> Artem
> >>> >>>
> >>> >>> --
> >>> >>> Founder, Android Police
<http://www.androidpolice.com>, APK
> >Mirror
> >>> >>> <http://www.apkmirror.com/>, Illogical
Robot LLC
> >>> >>> beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
> >>> >>>
> >>> >>
> >>>
> >>> I's not supported  , but usually it works.
> >>>
> >>> In worst case scenario,  you can remove the node, wipe gluster
on
> >the
> >>> node, reinstall the packages and add it - it will require full
heal
> >of the
> >>> brick and as you have previously reported could lead to
performance
> >>> degradation.
> >>>
> >>> I think you are on SLES, but I could be wrong . Do you have
btrfs or
> >LVM
> >>> snapshots to revert from ?
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >>
>
> Hi Artem,
>
> You can increase the brick log level following
>
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
> but keep in mind that logs grow quite fast - so don't keep them above
the
> current level for too much time.
>
>
> Do you have a geo replication running ?
>
> About the migration issue - I have no clue why this happened. Last time I
> skipped a major release(3.12  to 5.5) I got a huge trouble (all files
> ownership was switched to root)  and I have the feeling  that it won't
> happen again if you go through v6.
>
> Best Regards,
> Strahil Nikolov
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200502/a37ed1d0/attachment.html>

Strahil Nikolov

2020-May-03 05:52 UTC

head link

[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages

On May 2, 2020 8:18:38 PM GMT+03:00, Artem Russakovskii <archon810 at
gmail.com> wrote:>I don't have geo replication.
>
>Still waiting for someone from the gluster team to chime in. They used
>to
>be a lot more responsive here. Do you know if there is a holiday
>perhaps,
>or have the working hours been cut due to Coronavirus currently?
>
>I'm not inclined to try a v6 upgrade without their word first.
>
>On Sat, May 2, 2020, 12:47 AM Strahil Nikolov <hunter86_bg at
yahoo.com>
>wrote:
>
>> On May 1, 2020 8:03:50 PM GMT+03:00, Artem Russakovskii <
>> archon810 at gmail.com> wrote:
>> >The good news is the downgrade seems to have worked and was
>painless.
>> >
>> >zypper install --oldpackage glusterfs-5.13, restart gluster, and
>almost
>> >immediately there are no heal pending entries anymore.
>> >
>> >The only things still showing up in the logs, besides some healing
>is
>> >0-glusterfs-fuse:
>> >writing to fuse device failed: No such file or directory:
>> >==> mnt-androidpolice_data3.log <=>> >[2020-05-01
16:54:21.085643] E
>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >(-->
>>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>> >(-->
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>> >(-->
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>> >/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
0-glusterfs-fuse:
>> >writing to fuse device failed: No such file or directory
>> >==> mnt-apkmirror_data1.log <=>> >[2020-05-01
16:54:21.268842] E
>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >(-->
>>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fdf2b0a624d]
>> >(-->
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fdf2748949a]
>> >(-->
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fdf274897bb]
>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fdf2a5f64f9] (-->
>> >/lib64/libc.so.6(clone+0x3f)[0x7fdf2a32ef2f] )))))
0-glusterfs-fuse:
>> >writing to fuse device failed: No such file or directory
>> >
>> >It'd be very helpful if it had more info about what failed to
write
>and
>> >why.
>> >
>> >I'd still really love to see the analysis of this failed
upgrade
>from
>> >core
>> >gluster maintainers to see what needs fixing and how we can upgrade
>in
>> >the
>> >future.
>> >
>> >Thanks.
>> >
>> >Sincerely,
>> >Artem
>> >
>> >--
>> >Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>> ><http://www.apkmirror.com/>, Illogical Robot LLC
>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR>
>> >
>> >
>> >On Fri, May 1, 2020 at 7:25 AM Artem Russakovskii
><archon810 at gmail.com>
>> >wrote:
>> >
>> >> I do not have snapshots, no. I have a general file based
backup,
>but
>> >also
>> >> the other 3 nodes are up.
>> >>
>> >> OpenSUSE 15.1.
>> >>
>> >> If I try to downgrade and it doesn't work, what's the
brick
>> >replacement
>> >> scenario - is this still accurate?
>> >>
>> >
>>
>https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
>> >>
>> >> Any feedback about the issues themselves yet please?
Specifically,
>is
>> >> there a chance this is happening because of the mismatched
gluster
>> >> versions? Though, what's the solution then?
>> >>
>> >> On Fri, May 1, 2020, 1:07 AM Strahil Nikolov
><hunter86_bg at yahoo.com>
>> >> wrote:
>> >>
>> >>> On May 1, 2020 1:25:17 AM GMT+03:00, Artem Russakovskii
<
>> >>> archon810 at gmail.com> wrote:
>> >>> >If more time is needed to analyze this, is this an
option? Shut
>> >down
>> >>> >7.5,
>> >>> >downgrade it back to 5.13 and restart, or would this
screw
>> >something up
>> >>> >badly? I didn't up the op-version yet.
>> >>> >
>> >>> >Thanks.
>> >>> >
>> >>> >Sincerely,
>> >>> >Artem
>> >>> >
>> >>> >--
>> >>> >Founder, Android Police
<http://www.androidpolice.com>, APK
>Mirror
>> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC
>> >>> >beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
>> >>> >
>> >>> >
>> >>> >On Thu, Apr 30, 2020 at 3:13 PM Artem Russakovskii
>> >>> ><archon810 at gmail.com>
>> >>> >wrote:
>> >>> >
>> >>> >> The number of heal pending on citadel, the one
that was
>upgraded
>> >to
>> >>> >7.5,
>> >>> >> has now gone to 10s of thousands and continues to
go up.
>> >>> >>
>> >>> >> Sincerely,
>> >>> >> Artem
>> >>> >>
>> >>> >> --
>> >>> >> Founder, Android Police
<http://www.androidpolice.com>, APK
>> >Mirror
>> >>> >> <http://www.apkmirror.com/>, Illogical
Robot LLC
>> >>> >> beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
>> >>> >>
>> >>> >>
>> >>> >> On Thu, Apr 30, 2020 at 2:57 PM Artem
Russakovskii
>> >>> ><archon810 at gmail.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> Hi all,
>> >>> >>>
>> >>> >>> Today, I decided to upgrade one of the four
servers (citadel)
>we
>> >>> >have to
>> >>> >>> 7.5 from 5.13. There are 2 volumes, 1x4
replicate, and fuse
>> >mounts
>> >>> >(I sent
>> >>> >>> the full details earlier in another message).
If everything
>> >looked
>> >>> >OK, I
>> >>> >>> would have proceeded the rolling upgrade for
all of them,
>> >following
>> >>> >the
>> >>> >>> full heal.
>> >>> >>>
>> >>> >>> However, as soon as I upgraded and restarted,
the logs filled
>> >with
>> >>> >>> messages like these:
>> >>> >>>
>> >>> >>> [2020-04-30 21:39:21.316149] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc
>actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>> [2020-04-30 21:39:21.382891] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc
>actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>> [2020-04-30 21:39:21.442440] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc
>actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>> [2020-04-30 21:39:21.445587] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc
>actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>> [2020-04-30 21:39:21.571398] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc
>actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>> [2020-04-30 21:39:21.668192] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc
>actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>>
>> >>> >>>
>> >>> >>> The message "I [MSGID: 108031]
>> >>> >>> [afr-common.c:2581:afr_local_discovery_cbk]
>> >>> >>> 0-androidpolice_data3-replicate-0: selecting
local read_child
>> >>> >>> androidpolice_data3-client-3" repeated
10 times between
>> >[2020-04-30
>> >>> >>> 21:46:41.854675] and [2020-04-30
21:48:20.206323]
>> >>> >>> The message "W [MSGID: 114031]
>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>> >>> >>> 0-androidpolice_data3-client-1: remote
operation failed
>> >[Transport
>> >>> >endpoint
>> >>> >>> is not connected]" repeated 264 times
between [2020-04-30
>> >>> >21:46:32.129567]
>> >>> >>> and [2020-04-30 21:48:29.905008]
>> >>> >>> The message "W [MSGID: 114031]
>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>> >>> >>> 0-androidpolice_data3-client-0: remote
operation failed
>> >[Transport
>> >>> >endpoint
>> >>> >>> is not connected]" repeated 264 times
between [2020-04-30
>> >>> >21:46:32.129602]
>> >>> >>> and [2020-04-30 21:48:29.905040]
>> >>> >>> The message "W [MSGID: 114031]
>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>> >>> >>> 0-androidpolice_data3-client-2: remote
operation failed
>> >[Transport
>> >>> >endpoint
>> >>> >>> is not connected]" repeated 264 times
between [2020-04-30
>> >>> >21:46:32.129512]
>> >>> >>> and [2020-04-30 21:48:29.905047]
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> Once in a while, I'm seeing this:
>> >>> >>> ==>
bricks/mnt-hive_block4-androidpolice_data3.log <=>> >>>
>>> [2020-04-30 21:45:54.251637] I [MSGID: 115072]
>> >>> >>>
[server-rpc-fops_v2.c:1681:server4_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-server: 5725811:
SETATTR /
>> >>> >>>
>> >>> >
>> >>>
>> >
>>
>androidpolice.com/public/wp-content/uploads/2019/03/cielo-breez-plus-hero.png
>> >>> >>> (d4556eb4-f15b-412c-a42a-32b4438af557),
client:
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-2-RECON_NO:-1,
>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
>not
>> >>> >permitted]
>> >>> >>> [2020-04-30 21:49:10.439701] I [MSGID:
115072]
>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-server: 201833: SETATTR
/
>> >>> >>> androidpolice.com/public/wp-content/uploads
>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591),
client:
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
>not
>> >>> >permitted]
>> >>> >>> [2020-04-30 21:49:10.453724] I [MSGID:
115072]
>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-server: 201842: SETATTR
/
>> >>> >>> androidpolice.com/public/wp-content/uploads
>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591),
client:
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
>not
>> >>> >permitted]
>> >>> >>> [2020-04-30 21:49:16.224662] I [MSGID:
115072]
>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-server: 202865: SETATTR
/
>> >>> >>> androidpolice.com/public/wp-content/uploads
>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591),
client:
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation
>not
>> >>> >permitted]
>> >>> >>>
>> >>> >>> There's also lots of self-healing
happening that I didn't
>expect
>> >at
>> >>> >all,
>> >>> >>> since the upgrade only took ~10-15s.
>> >>> >>> [2020-04-30 21:47:38.714448] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> >>> >>> 0-apkmirror_data1-replicate-0: performing
metadata selfheal
>on
>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461
>> >>> >>> [2020-04-30 21:47:38.765033] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> 0-apkmirror_data1-replicate-0: Completed
metadata selfheal on
>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461.
sources=[3]  sinks=0 1
>2
>> >>> >>> [2020-04-30 21:47:38.765289] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> >>> >>> 0-apkmirror_data1-replicate-0: performing
metadata selfheal
>on
>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296
>> >>> >>> [2020-04-30 21:47:38.800987] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> 0-apkmirror_data1-replicate-0: Completed
metadata selfheal on
>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296.
sources=[3]  sinks=0 1
>2
>> >>> >>>
>> >>> >>>
>> >>> >>> I'm also seeing "remote operation
failed" and "writing to
>fuse
>> >>> >device
>> >>> >>> failed: No such file or directory"
messages
>> >>> >>> [2020-04-30 21:46:34.891957] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> 0-androidpolice_data3-replicate-0: Completed
metadata
>selfheal
>> >on
>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591.
sources=0 1 [2]
>sinks=3
>> >>> >>> [2020-04-30 21:45:36.127412] W [MSGID:
114031]
>> >>> >>>
[client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-client-0: remote
operation failed
>> >[Operation
>> >>> >not
>> >>> >>> permitted]
>> >>> >>> [2020-04-30 21:45:36.345924] W [MSGID:
114031]
>> >>> >>>
[client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-client-1: remote
operation failed
>> >[Operation
>> >>> >not
>> >>> >>> permitted]
>> >>> >>> [2020-04-30 21:46:35.291853] I [MSGID:
108031]
>> >>> >>> [afr-common.c:2543:afr_local_discovery_cbk]
>> >>> >>> 0-androidpolice_data3-replicate-0: selecting
local read_child
>> >>> >>> androidpolice_data3-client-2
>> >>> >>> [2020-04-30 21:46:35.977342] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> >>> >>> 0-androidpolice_data3-replicate-0: performing
metadata
>selfheal
>> >on
>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591
>> >>> >>> [2020-04-30 21:46:36.006607] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> 0-androidpolice_data3-replicate-0: Completed
metadata
>selfheal
>> >on
>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591.
sources=0 1 [2]
>sinks=3
>> >>> >>> [2020-04-30 21:46:37.245599] E
>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >>> >>> (-->
>> >>>
>>
>>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>> >>> >>> (-->
>> >>> >>>
>> >>>
>>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>> >>> >>> (-->
>> >>> >>>
>> >>>
>>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>> >>> >>> (-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f]
)))))
>> >0-glusterfs-fuse:
>> >>> >>> writing to fuse device failed: No such file
or directory
>> >>> >>> [2020-04-30 21:46:50.864797] E
>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >>> >>> (-->
>> >>>
>>
>>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>> >>> >>> (-->
>> >>> >>>
>> >>>
>>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>> >>> >>> (-->
>> >>> >>>
>> >>>
>>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>> >>> >>> (-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f]
)))))
>> >0-glusterfs-fuse:
>> >>> >>> writing to fuse device failed: No such file
or directory
>> >>> >>>
>> >>> >>> The number of items being healed is going up
and down wildly,
>> >from 0
>> >>> >to
>> >>> >>> 8000+ and sometimes taking a really long time
to return a
>value.
>> >I'm
>> >>> >really
>> >>> >>> worried as this is a production system, and I
didn't observe
>> >this in
>> >>> >our
>> >>> >>> test system.
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> gluster v heal apkmirror_data1 info summary
>> >>> >>> Brick
nexus2:/mnt/nexus2_block1/apkmirror_data1
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 27
>> >>> >>> Number of entries in heal pending: 27
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick forge:/mnt/forge_block1/apkmirror_data1
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 27
>> >>> >>> Number of entries in heal pending: 27
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick hive:/mnt/hive_block1/apkmirror_data1
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 27
>> >>> >>> Number of entries in heal pending: 27
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick
citadel:/mnt/citadel_block1/apkmirror_data1
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 8540
>> >>> >>> Number of entries in heal pending: 8540
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> gluster v heal androidpolice_data3 info
summary
>> >>> >>> Brick
nexus2:/mnt/nexus2_block4/androidpolice_data3
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 1
>> >>> >>> Number of entries in heal pending: 1
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick
forge:/mnt/forge_block4/androidpolice_data3
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 1
>> >>> >>> Number of entries in heal pending: 1
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick
hive:/mnt/hive_block4/androidpolice_data3
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 1
>> >>> >>> Number of entries in heal pending: 1
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick
citadel:/mnt/citadel_block4/androidpolice_data3
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 1149
>> >>> >>> Number of entries in heal pending: 1149
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>>
>> >>> >>> What should I do at this point? The files I
tested seem to be
>> >>> >replicating
>> >>> >>> correctly, but I don't know if it's
the case for all of them,
>> >and
>> >>> >the heals
>> >>> >>> going up and down, and all these log messages
are making me
>very
>> >>> >nervous.
>> >>> >>>
>> >>> >>> Thank you.
>> >>> >>>
>> >>> >>> Sincerely,
>> >>> >>> Artem
>> >>> >>>
>> >>> >>> --
>> >>> >>> Founder, Android Police
<http://www.androidpolice.com>, APK
>> >Mirror
>> >>> >>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>> >>> >>> beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
>> >>> >>>
>> >>> >>
>> >>>
>> >>> I's not supported  , but usually it works.
>> >>>
>> >>> In worst case scenario,  you can remove the node, wipe
gluster on
>> >the
>> >>> node, reinstall the packages and add it - it will require
full
>heal
>> >of the
>> >>> brick and as you have previously reported could lead to
>performance
>> >>> degradation.
>> >>>
>> >>> I think you are on SLES, but I could be wrong . Do you
have btrfs
>or
>> >LVM
>> >>> snapshots to revert from ?
>> >>>
>> >>> Best Regards,
>> >>> Strahil Nikolov
>> >>>
>> >>
>>
>> Hi Artem,
>>
>> You can increase the brick log level following
>>
>https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
>> but keep in mind that logs grow quite fast - so don't keep them
above
>the
>> current level for too much time.
>>
>>
>> Do you have a geo replication running ?
>>
>> About the migration issue - I have no clue why this happened. Last
>time I
>> skipped a major release(3.12  to 5.5) I got a huge trouble (all files
>> ownership was switched to root)  and I have the feeling  that it
>won't
>> happen again if you go through v6.
>>
>> Best Regards,
>> Strahil Nikolov
>>
Hi Artem,

1st of May is an international holliday ,  while 6th of May is a holliday for
some contries.
I guess they will join on monday.

Best Regards,
Strahil Nikolov

Amar Tumballi

2020-May-04 13:26 UTC

head link

[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages

On Sat, May 2, 2020 at 10:49 PM Artem Russakovskii <archon810 at
gmail.com>
wrote:
> I don't have geo replication.
>
> Still waiting for someone from the gluster team to chime in. They used to
> be a lot more responsive here. Do you know if there is a holiday perhaps,
> or have the working hours been cut due to Coronavirus currently?
>
>It was Holiday on May 1st, and 2nd and 3rd were Weekend days!  And also I
guess many of Developers from Red Hat were attending Virtual Summit!


> I'm not inclined to try a v6 upgrade without their word first.
>
Fair bet! I will bring this topic in one of the community meetings, and ask
developers if they have some feedback! I personally have not seen these
errors, and don't have a hunch on which patch would have caused an increase
in logs!

-Amar

>
> On Sat, May 2, 2020, 12:47 AM Strahil Nikolov <hunter86_bg at
yahoo.com>
> wrote:
>
>> On May 1, 2020 8:03:50 PM GMT+03:00, Artem Russakovskii <
>> archon810 at gmail.com> wrote:
>> >The good news is the downgrade seems to have worked and was
painless.
>> >
>> >zypper install --oldpackage glusterfs-5.13, restart gluster, and
almost
>> >immediately there are no heal pending entries anymore.
>> >
>> >The only things still showing up in the logs, besides some healing
is
>> >0-glusterfs-fuse:
>> >writing to fuse device failed: No such file or directory:
>> >==> mnt-androidpolice_data3.log <=>> >[2020-05-01
16:54:21.085643] E
>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >(-->
>>
>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>> >(-->
>>
>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>> >(-->
>>
>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>> >/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
0-glusterfs-fuse:
>> >writing to fuse device failed: No such file or directory
>> >==> mnt-apkmirror_data1.log <=>> >[2020-05-01
16:54:21.268842] E
>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >(-->
>>
>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fdf2b0a624d]
>> >(-->
>>
>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fdf2748949a]
>> >(-->
>>
>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fdf274897bb]
>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fdf2a5f64f9] (-->
>> >/lib64/libc.so.6(clone+0x3f)[0x7fdf2a32ef2f] )))))
0-glusterfs-fuse:
>> >writing to fuse device failed: No such file or directory
>> >
>> >It'd be very helpful if it had more info about what failed to
write and
>> >why.
>> >
>> >I'd still really love to see the analysis of this failed
upgrade from
>> >core
>> >gluster maintainers to see what needs fixing and how we can upgrade
in
>> >the
>> >future.
>> >
>> >Thanks.
>> >
>> >Sincerely,
>> >Artem
>> >
>> >--
>> >Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>> ><http://www.apkmirror.com/>, Illogical Robot LLC
>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR>
>> >
>> >
>> >On Fri, May 1, 2020 at 7:25 AM Artem Russakovskii <archon810 at
gmail.com>
>> >wrote:
>> >
>> >> I do not have snapshots, no. I have a general file based
backup, but
>> >also
>> >> the other 3 nodes are up.
>> >>
>> >> OpenSUSE 15.1.
>> >>
>> >> If I try to downgrade and it doesn't work, what's the
brick
>> >replacement
>> >> scenario - is this still accurate?
>> >>
>> >
>>
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
>> >>
>> >> Any feedback about the issues themselves yet please?
Specifically, is
>> >> there a chance this is happening because of the mismatched
gluster
>> >> versions? Though, what's the solution then?
>> >>
>> >> On Fri, May 1, 2020, 1:07 AM Strahil Nikolov <hunter86_bg
at yahoo.com>
>> >> wrote:
>> >>
>> >>> On May 1, 2020 1:25:17 AM GMT+03:00, Artem Russakovskii
<
>> >>> archon810 at gmail.com> wrote:
>> >>> >If more time is needed to analyze this, is this an
option? Shut
>> >down
>> >>> >7.5,
>> >>> >downgrade it back to 5.13 and restart, or would this
screw
>> >something up
>> >>> >badly? I didn't up the op-version yet.
>> >>> >
>> >>> >Thanks.
>> >>> >
>> >>> >Sincerely,
>> >>> >Artem
>> >>> >
>> >>> >--
>> >>> >Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC
>> >>> >beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
>> >>> >
>> >>> >
>> >>> >On Thu, Apr 30, 2020 at 3:13 PM Artem Russakovskii
>> >>> ><archon810 at gmail.com>
>> >>> >wrote:
>> >>> >
>> >>> >> The number of heal pending on citadel, the one
that was upgraded
>> >to
>> >>> >7.5,
>> >>> >> has now gone to 10s of thousands and continues to
go up.
>> >>> >>
>> >>> >> Sincerely,
>> >>> >> Artem
>> >>> >>
>> >>> >> --
>> >>> >> Founder, Android Police
<http://www.androidpolice.com>, APK
>> >Mirror
>> >>> >> <http://www.apkmirror.com/>, Illogical
Robot LLC
>> >>> >> beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
>> >>> >>
>> >>> >>
>> >>> >> On Thu, Apr 30, 2020 at 2:57 PM Artem
Russakovskii
>> >>> ><archon810 at gmail.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> Hi all,
>> >>> >>>
>> >>> >>> Today, I decided to upgrade one of the four
servers (citadel) we
>> >>> >have to
>> >>> >>> 7.5 from 5.13. There are 2 volumes, 1x4
replicate, and fuse
>> >mounts
>> >>> >(I sent
>> >>> >>> the full details earlier in another message).
If everything
>> >looked
>> >>> >OK, I
>> >>> >>> would have proceeded the rolling upgrade for
all of them,
>> >following
>> >>> >the
>> >>> >>> full heal.
>> >>> >>>
>> >>> >>> However, as soon as I upgraded and restarted,
the logs filled
>> >with
>> >>> >>> messages like these:
>> >>> >>>
>> >>> >>> [2020-04-30 21:39:21.316149] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>> [2020-04-30 21:39:21.382891] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>> [2020-04-30 21:39:21.442440] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>> [2020-04-30 21:39:21.445587] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>> [2020-04-30 21:39:21.571398] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>> [2020-04-30 21:39:21.668192] E
>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor
>> >>> >>> (1298437:400:17) failed to complete
successfully
>> >>> >>>
>> >>> >>>
>> >>> >>> The message "I [MSGID: 108031]
>> >>> >>> [afr-common.c:2581:afr_local_discovery_cbk]
>> >>> >>> 0-androidpolice_data3-replicate-0: selecting
local read_child
>> >>> >>> androidpolice_data3-client-3" repeated
10 times between
>> >[2020-04-30
>> >>> >>> 21:46:41.854675] and [2020-04-30
21:48:20.206323]
>> >>> >>> The message "W [MSGID: 114031]
>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>> >>> >>> 0-androidpolice_data3-client-1: remote
operation failed
>> >[Transport
>> >>> >endpoint
>> >>> >>> is not connected]" repeated 264 times
between [2020-04-30
>> >>> >21:46:32.129567]
>> >>> >>> and [2020-04-30 21:48:29.905008]
>> >>> >>> The message "W [MSGID: 114031]
>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>> >>> >>> 0-androidpolice_data3-client-0: remote
operation failed
>> >[Transport
>> >>> >endpoint
>> >>> >>> is not connected]" repeated 264 times
between [2020-04-30
>> >>> >21:46:32.129602]
>> >>> >>> and [2020-04-30 21:48:29.905040]
>> >>> >>> The message "W [MSGID: 114031]
>> >>> >>>
[client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>> >>> >>> 0-androidpolice_data3-client-2: remote
operation failed
>> >[Transport
>> >>> >endpoint
>> >>> >>> is not connected]" repeated 264 times
between [2020-04-30
>> >>> >21:46:32.129512]
>> >>> >>> and [2020-04-30 21:48:29.905047]
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> Once in a while, I'm seeing this:
>> >>> >>> ==>
bricks/mnt-hive_block4-androidpolice_data3.log <=>> >>>
>>> [2020-04-30 21:45:54.251637] I [MSGID: 115072]
>> >>> >>>
[server-rpc-fops_v2.c:1681:server4_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-server: 5725811:
SETATTR /
>> >>> >>>
>> >>> >
>> >>>
>> >
>>
androidpolice.com/public/wp-content/uploads/2019/03/cielo-breez-plus-hero.png
>> >>> >>> (d4556eb4-f15b-412c-a42a-32b4438af557),
client:
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-2-RECON_NO:-1,
>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation not
>> >>> >permitted]
>> >>> >>> [2020-04-30 21:49:10.439701] I [MSGID:
115072]
>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-server: 201833: SETATTR
/
>> >>> >>> androidpolice.com/public/wp-content/uploads
>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591),
client:
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation not
>> >>> >permitted]
>> >>> >>> [2020-04-30 21:49:10.453724] I [MSGID:
115072]
>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-server: 201842: SETATTR
/
>> >>> >>> androidpolice.com/public/wp-content/uploads
>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591),
client:
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation not
>> >>> >permitted]
>> >>> >>> [2020-04-30 21:49:16.224662] I [MSGID:
115072]
>> >>> >>>
[server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-server: 202865: SETATTR
/
>> >>> >>> androidpolice.com/public/wp-content/uploads
>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591),
client:
>> >>> >>>
>> >>>
>> >>>
>>
>>
>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>> >>> >>> error-xlator:
androidpolice_data3-access-control [Operation not
>> >>> >permitted]
>> >>> >>>
>> >>> >>> There's also lots of self-healing
happening that I didn't expect
>> >at
>> >>> >all,
>> >>> >>> since the upgrade only took ~10-15s.
>> >>> >>> [2020-04-30 21:47:38.714448] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> >>> >>> 0-apkmirror_data1-replicate-0: performing
metadata selfheal on
>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461
>> >>> >>> [2020-04-30 21:47:38.765033] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> 0-apkmirror_data1-replicate-0: Completed
metadata selfheal on
>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461.
sources=[3]  sinks=0 1 2
>> >>> >>> [2020-04-30 21:47:38.765289] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> >>> >>> 0-apkmirror_data1-replicate-0: performing
metadata selfheal on
>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296
>> >>> >>> [2020-04-30 21:47:38.800987] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> 0-apkmirror_data1-replicate-0: Completed
metadata selfheal on
>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296.
sources=[3]  sinks=0 1 2
>> >>> >>>
>> >>> >>>
>> >>> >>> I'm also seeing "remote operation
failed" and "writing to fuse
>> >>> >device
>> >>> >>> failed: No such file or directory"
messages
>> >>> >>> [2020-04-30 21:46:34.891957] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> 0-androidpolice_data3-replicate-0: Completed
metadata selfheal
>> >on
>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591.
sources=0 1 [2]  sinks=3
>> >>> >>> [2020-04-30 21:45:36.127412] W [MSGID:
114031]
>> >>> >>>
[client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-client-0: remote
operation failed
>> >[Operation
>> >>> >not
>> >>> >>> permitted]
>> >>> >>> [2020-04-30 21:45:36.345924] W [MSGID:
114031]
>> >>> >>>
[client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
>> >>> >>> 0-androidpolice_data3-client-1: remote
operation failed
>> >[Operation
>> >>> >not
>> >>> >>> permitted]
>> >>> >>> [2020-04-30 21:46:35.291853] I [MSGID:
108031]
>> >>> >>> [afr-common.c:2543:afr_local_discovery_cbk]
>> >>> >>> 0-androidpolice_data3-replicate-0: selecting
local read_child
>> >>> >>> androidpolice_data3-client-2
>> >>> >>> [2020-04-30 21:46:35.977342] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> >>> >>> 0-androidpolice_data3-replicate-0: performing
metadata selfheal
>> >on
>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591
>> >>> >>> [2020-04-30 21:46:36.006607] I [MSGID:
108026]
>> >>> >>>
[afr-self-heal-common.c:1723:afr_log_selfheal]
>> >>> >>> 0-androidpolice_data3-replicate-0: Completed
metadata selfheal
>> >on
>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591.
sources=0 1 [2]  sinks=3
>> >>> >>> [2020-04-30 21:46:37.245599] E
>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >>> >>> (-->
>> >>>
>>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>> >>> >>> (-->
>> >>> >>>
>> >>>
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>> >>> >>> (-->
>> >>> >>>
>> >>>
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>> >>> >>> (-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f]
)))))
>> >0-glusterfs-fuse:
>> >>> >>> writing to fuse device failed: No such file
or directory
>> >>> >>> [2020-04-30 21:46:50.864797] E
>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>> >>> >>> (-->
>> >>>
>>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>> >>> >>> (-->
>> >>> >>>
>> >>>
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>> >>> >>> (-->
>> >>> >>>
>> >>>
>>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>> >>> >>> (-->
/lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f]
)))))
>> >0-glusterfs-fuse:
>> >>> >>> writing to fuse device failed: No such file
or directory
>> >>> >>>
>> >>> >>> The number of items being healed is going up
and down wildly,
>> >from 0
>> >>> >to
>> >>> >>> 8000+ and sometimes taking a really long time
to return a value.
>> >I'm
>> >>> >really
>> >>> >>> worried as this is a production system, and I
didn't observe
>> >this in
>> >>> >our
>> >>> >>> test system.
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> gluster v heal apkmirror_data1 info summary
>> >>> >>> Brick
nexus2:/mnt/nexus2_block1/apkmirror_data1
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 27
>> >>> >>> Number of entries in heal pending: 27
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick forge:/mnt/forge_block1/apkmirror_data1
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 27
>> >>> >>> Number of entries in heal pending: 27
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick hive:/mnt/hive_block1/apkmirror_data1
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 27
>> >>> >>> Number of entries in heal pending: 27
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick
citadel:/mnt/citadel_block1/apkmirror_data1
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 8540
>> >>> >>> Number of entries in heal pending: 8540
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> gluster v heal androidpolice_data3 info
summary
>> >>> >>> Brick
nexus2:/mnt/nexus2_block4/androidpolice_data3
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 1
>> >>> >>> Number of entries in heal pending: 1
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick
forge:/mnt/forge_block4/androidpolice_data3
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 1
>> >>> >>> Number of entries in heal pending: 1
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick
hive:/mnt/hive_block4/androidpolice_data3
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 1
>> >>> >>> Number of entries in heal pending: 1
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>> Brick
citadel:/mnt/citadel_block4/androidpolice_data3
>> >>> >>> Status: Connected
>> >>> >>> Total Number of entries: 1149
>> >>> >>> Number of entries in heal pending: 1149
>> >>> >>> Number of entries in split-brain: 0
>> >>> >>> Number of entries possibly healing: 0
>> >>> >>>
>> >>> >>>
>> >>> >>> What should I do at this point? The files I
tested seem to be
>> >>> >replicating
>> >>> >>> correctly, but I don't know if it's
the case for all of them,
>> >and
>> >>> >the heals
>> >>> >>> going up and down, and all these log messages
are making me very
>> >>> >nervous.
>> >>> >>>
>> >>> >>> Thank you.
>> >>> >>>
>> >>> >>> Sincerely,
>> >>> >>> Artem
>> >>> >>>
>> >>> >>> --
>> >>> >>> Founder, Android Police
<http://www.androidpolice.com>, APK
>> >Mirror
>> >>> >>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>> >>> >>> beerpla.net | @ArtemR
<http://twitter.com/ArtemR>
>> >>> >>>
>> >>> >>
>> >>>
>> >>> I's not supported  , but usually it works.
>> >>>
>> >>> In worst case scenario,  you can remove the node, wipe
gluster on
>> >the
>> >>> node, reinstall the packages and add it - it will require
full heal
>> >of the
>> >>> brick and as you have previously reported could lead to
performance
>> >>> degradation.
>> >>>
>> >>> I think you are on SLES, but I could be wrong . Do you
have btrfs or
>> >LVM
>> >>> snapshots to revert from ?
>> >>>
>> >>> Best Regards,
>> >>> Strahil Nikolov
>> >>>
>> >>
>>
>> Hi Artem,
>>
>> You can increase the brick log level following
>>
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
>> but keep in mind that logs grow quite fast - so don't keep them
above the
>> current level for too much time.
>>
>>
>> Do you have a geo replication running ?
>>
>> About the migration issue - I have no clue why this happened. Last time
I
>> skipped a major release(3.12  to 5.5) I got a huge trouble (all files
>> ownership was switched to root)  and I have the feeling  that it
won't
>> happen again if you go through v6.
>>
>> Best Regards,
>> Strahil Nikolov
>>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>

-- 
--
https://kadalu.io
Container Storage made easy!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200504/6b0cee68/attachment.html>

Gluster users - May 2020 - Upgrade from 5.13 to 7.5 full of weird messages

[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages

[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages

[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages