thr3ads.net - Gluster users - [Gluster-users] GeoRep Faulty after Gluster 7 to 8 upgrade

If this information is useful, please help other people find it:
Share via:

Matthew Benstead

2021-Mar-10 22:38 UTC

[Gluster-users] GeoRep Faulty after Gluster 7 to 8 upgrade - gfchangelog: wrong result

Thanks Strahil,

Right - I had come across your message in early January that v8 from the
CentOS Sig was missing the SELinux rules, and had put SELinux into
permissive mode after the upgrade when I saw denied messages in the
audit logs.

[root at storage01 ~]# sestatus | egrep "^SELinux status|[mM]ode"
SELinux status:???????????????? enabled
Current mode:?????????????????? permissive
Mode from config file:????????? enforcing

Yes - I am using an unprivileged user for georep:?

[root at pcic-backup01 ~]# gluster-mountbroker status
+-------------+-------------+---------------------------+--------------+--------------------------+
|???? NODE??? | NODE STATUS |???????? MOUNT ROOT??????? |??? GROUP????
|????????? USERS?????????? |
+-------------+-------------+---------------------------+--------------+--------------------------+
| 10.0.231.82 |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) |
geoaccount(pcic-backup)? |
|? localhost? |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) |
geoaccount(pcic-backup)? |
+-------------+-------------+---------------------------+--------------+--------------------------+

[root at pcic-backup02 ~]# gluster-mountbroker status
+-------------+-------------+---------------------------+--------------+--------------------------+
|???? NODE??? | NODE STATUS |???????? MOUNT ROOT??????? |??? GROUP????
|????????? USERS?????????? |
+-------------+-------------+---------------------------+--------------+--------------------------+
| 10.0.231.81 |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) |
geoaccount(pcic-backup)? |
|? localhost? |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) |
geoaccount(pcic-backup)? |
+-------------+-------------+---------------------------+--------------+--------------------------+

Thanks,
?-Matthew

--
Matthew Benstead
System Administrator
Pacific Climate Impacts Consortium <https://pacificclimate.org/>
University of Victoria, UH1
PO Box 1800, STN CSC
Victoria, BC, V8W 2Y2
Phone: +1-250-721-8432
Email: matthewb at uvic.ca

On 3/10/21 2:11 PM, Strahil Nikolov wrote:> Notice: This message was sent from outside the University of Victoria
> email system. Please be cautious with links and sensitive information.
>
> I have tested georep on v8.3 and it was running quite well untill you
> involve SELINUX.
>
> Are you using SELINUX ?
> Are you using unprivileged user for the georep ?
>
> Also, you can
>
check?https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/sect-troubleshooting_geo-replication
> .
>
> Best Regards,
> Strahil Nikolov
>
>     On Thu, Mar 11, 2021 at 0:03, Matthew Benstead
>     <matthewb at uvic.ca> wrote:
>     Hello,
>
>     I recently upgraded my Distributed-Replicate cluster from Gluster
>     7.9 to 8.3 on CentOS7 using the CentOS Storage SIG packages. I had
>     geo-replication syncing properly before the upgrade, but not it is
>     not working after.
>
>     After I had upgraded both master and slave clusters I attempted to
>     start geo-replication again, but it goes to faulty quickly:
>
>     [root at storage01 ~]# gluster volume geo-replication storage
>     geoaccount at 10.0.231.81::pcic-backup
>     <mailto:geoaccount at 10.0.231.81::pcic-backup> start
>     Starting geo-replication session between storage &
>     geoaccount at 10.0.231.81::pcic-backup
>     <mailto:geoaccount at 10.0.231.81::pcic-backup> has been
successful\
>     ?????
>     [root at storage01 ~]# gluster volume geo-replication status
>     ?
>     MASTER NODE??? MASTER VOL??? MASTER BRICK?????????????? SLAVE
>     USER??? SLAVE??????????????????????????????????????? SLAVE NODE???
>     STATUS??? CRAWL STATUS??? LAST_SYNCED?????????
>    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>     10.0.231.91??? storage?????? /data/storage_a/storage???
>     geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup
>     <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>???
>     N/A?????????? Faulty??? N/A???????????? N/A?????????????????
>     10.0.231.91??? storage?????? /data/storage_c/storage???
>     geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup
>     <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>???
>     N/A?????????? Faulty??? N/A???????????? N/A?????????????????
>     10.0.231.91??? storage?????? /data/storage_b/storage???
>     geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup
>     <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>???
>     N/A?????????? Faulty??? N/A???????????? N/A?????????????????
>     10.0.231.92??? storage?????? /data/storage_b/storage???
>     geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup
>     <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>???
>     N/A?????????? Faulty??? N/A???????????? N/A?????????????????
>     10.0.231.92??? storage?????? /data/storage_a/storage???
>     geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup
>     <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>???
>     N/A?????????? Faulty??? N/A???????????? N/A?????????????????
>     10.0.231.92??? storage?????? /data/storage_c/storage???
>     geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup
>     <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>???
>     N/A?????????? Faulty??? N/A???????????? N/A?????????????????
>     10.0.231.93??? storage?????? /data/storage_c/storage???
>     geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup
>     <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>???
>     N/A?????????? Faulty??? N/A???????????? N/A?????????????????
>     10.0.231.93??? storage?????? /data/storage_b/storage???
>     geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup
>     <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>???
>     N/A?????????? Faulty??? N/A???????????? N/A?????????????????
>     10.0.231.93??? storage?????? /data/storage_a/storage???
>     geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup
>     <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>???
>     N/A?????????? Faulty??? N/A???????????? N/A?????????????????
>
>     [root at storage01 ~]# gluster volume geo-replication storage
>     geoaccount at 10.0.231.81::pcic-backup
>     <mailto:geoaccount at 10.0.231.81::pcic-backup> stop
>     Stopping geo-replication session between storage &
>     geoaccount at 10.0.231.81::pcic-backup
>     <mailto:geoaccount at 10.0.231.81::pcic-backup> has been
successful
>
>
>     I went through the gsyncd logs and see it attempts to go back
>     through the changlogs - which would make sense - but fails:
>
>     [2021-03-10 19:18:42.165807] I
>     [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
>     Status Change [{status=Initializing...}]
>     [2021-03-10 19:18:42.166136] I [monitor(monitor):160:monitor]
>     Monitor: starting gsyncd worker [{brick=/data/storage_a/storage},
>     {slave_node=10.0.231.81}]
>     [2021-03-10 19:18:42.167829] I [monitor(monitor):160:monitor]
>     Monitor: starting gsyncd worker [{brick=/data/storage_c/storage},
>     {slave_node=10.0.231.82}]
>     [2021-03-10 19:18:42.172343] I
>     [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
>     Status Change [{status=Initializing...}]
>     [2021-03-10 19:18:42.172580] I [monitor(monitor):160:monitor]
>     Monitor: starting gsyncd worker [{brick=/data/storage_b/storage},
>     {slave_node=10.0.231.82}]
>     [2021-03-10 19:18:42.235574] I [resource(worker
>     /data/storage_c/storage):1387:connect_remote] SSH: Initializing
>     SSH connection between master and slave...
>     [2021-03-10 19:18:42.236613] I [resource(worker
>     /data/storage_a/storage):1387:connect_remote] SSH: Initializing
>     SSH connection between master and slave...
>     [2021-03-10 19:18:42.238614] I [resource(worker
>     /data/storage_b/storage):1387:connect_remote] SSH: Initializing
>     SSH connection between master and slave...
>     [2021-03-10 19:18:44.144856] I [resource(worker
>     /data/storage_b/storage):1436:connect_remote] SSH: SSH connection
>     between master and slave established. [{duration=1.9059}]
>     [2021-03-10 19:18:44.145065] I [resource(worker
>     /data/storage_b/storage):1116:connect] GLUSTER: Mounting gluster
>     volume locally...
>     [2021-03-10 19:18:44.162873] I [resource(worker
>     /data/storage_a/storage):1436:connect_remote] SSH: SSH connection
>     between master and slave established. [{duration=1.9259}]
>     [2021-03-10 19:18:44.163412] I [resource(worker
>     /data/storage_a/storage):1116:connect] GLUSTER: Mounting gluster
>     volume locally...
>     [2021-03-10 19:18:44.167506] I [resource(worker
>     /data/storage_c/storage):1436:connect_remote] SSH: SSH connection
>     between master and slave established. [{duration=1.9316}]
>     [2021-03-10 19:18:44.167746] I [resource(worker
>     /data/storage_c/storage):1116:connect] GLUSTER: Mounting gluster
>     volume locally...
>     [2021-03-10 19:18:45.251372] I [resource(worker
>     /data/storage_b/storage):1139:connect] GLUSTER: Mounted gluster
>     volume [{duration=1.1062}]
>     [2021-03-10 19:18:45.251583] I [subcmds(worker
>     /data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn
>     successful. Acknowledging back to monitor
>     [2021-03-10 19:18:45.271950] I [resource(worker
>     /data/storage_c/storage):1139:connect] GLUSTER: Mounted gluster
>     volume [{duration=1.1041}]
>     [2021-03-10 19:18:45.272118] I [subcmds(worker
>     /data/storage_c/storage):84:subcmd_worker] <top>: Worker spawn
>     successful. Acknowledging back to monitor
>     [2021-03-10 19:18:45.275180] I [resource(worker
>     /data/storage_a/storage):1139:connect] GLUSTER: Mounted gluster
>     volume [{duration=1.1116}]
>     [2021-03-10 19:18:45.275361] I [subcmds(worker
>     /data/storage_a/storage):84:subcmd_worker] <top>: Worker spawn
>     successful. Acknowledging back to monitor
>     [2021-03-10 19:18:47.265618] I [master(worker
>     /data/storage_b/storage):1645:register] _GMaster: Working dir
>    
[{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage}]
>     [2021-03-10 19:18:47.265954] I [resource(worker
>     /data/storage_b/storage):1292:service_loop] GLUSTER: Register time
>     [{time=1615403927}]
>     [2021-03-10 19:18:47.276746] I [gsyncdstatus(worker
>     /data/storage_b/storage):281:set_active] GeorepStatus: Worker
>     Status Change [{status=Active}]
>     [2021-03-10 19:18:47.281194] I [gsyncdstatus(worker
>     /data/storage_b/storage):253:set_worker_crawl_status]
>     GeorepStatus: Crawl Status Change [{status=History Crawl}]
>     [2021-03-10 19:18:47.281404] I [master(worker
>     /data/storage_b/storage):1559:crawl] _GMaster: starting history
>     crawl [{turns=1}, {stime=(1614666552, 0)},
>     {entry_stime=(1614664113, 0)}, {etime=1615403927}]
>     [2021-03-10 19:18:47.285340] I [master(worker
>     /data/storage_c/storage):1645:register] _GMaster: Working dir
>    
[{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_c-storage}]
>     [2021-03-10 19:18:47.285579] I [resource(worker
>     /data/storage_c/storage):1292:service_loop] GLUSTER: Register time
>     [{time=1615403927}]
>     [2021-03-10 19:18:47.287383] I [master(worker
>     /data/storage_a/storage):1645:register] _GMaster: Working dir
>    
[{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_a-storage}]
>     [2021-03-10 19:18:47.287697] I [resource(worker
>     /data/storage_a/storage):1292:service_loop] GLUSTER: Register time
>     [{time=1615403927}]
>     [2021-03-10 19:18:47.298415] I [gsyncdstatus(worker
>     /data/storage_c/storage):281:set_active] GeorepStatus: Worker
>     Status Change [{status=Active}]
>     [2021-03-10 19:18:47.301342] I [gsyncdstatus(worker
>     /data/storage_a/storage):281:set_active] GeorepStatus: Worker
>     Status Change [{status=Active}]
>     [2021-03-10 19:18:47.304183] I [gsyncdstatus(worker
>     /data/storage_c/storage):253:set_worker_crawl_status]
>     GeorepStatus: Crawl Status Change [{status=History Crawl}]
>     [2021-03-10 19:18:47.304418] I [master(worker
>     /data/storage_c/storage):1559:crawl] _GMaster: starting history
>     crawl [{turns=1}, {stime=(1614666552, 0)},
>     {entry_stime=(1614664108, 0)}, {etime=1615403927}]
>     [2021-03-10 19:18:47.305294] E [resource(worker
>     /data/storage_c/storage):1312:service_loop] GLUSTER: Changelog
>     History Crawl failed [{error=[Errno 0] Success}]
>     [2021-03-10 19:18:47.308124] I [gsyncdstatus(worker
>     /data/storage_a/storage):253:set_worker_crawl_status]
>     GeorepStatus: Crawl Status Change [{status=History Crawl}]
>     [2021-03-10 19:18:47.308509] I [master(worker
>     /data/storage_a/storage):1559:crawl] _GMaster: starting history
>     crawl [{turns=1}, {stime=(1614666553, 0)},
>     {entry_stime=(1614664115, 0)}, {etime=1615403927}]
>     [2021-03-10 19:18:47.357470] E [resource(worker
>     /data/storage_b/storage):1312:service_loop] GLUSTER: Changelog
>     History Crawl failed [{error=[Errno 0] Success}]
>     [2021-03-10 19:18:47.383949] E [resource(worker
>     /data/storage_a/storage):1312:service_loop] GLUSTER: Changelog
>     History Crawl failed [{error=[Errno 0] Success}]
>     [2021-03-10 19:18:48.255340] I [monitor(monitor):228:monitor]
>     Monitor: worker died in startup phase
>     [{brick=/data/storage_b/storage}]
>     [2021-03-10 19:18:48.260052] I
>     [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
>     Status Change [{status=Faulty}]
>     [2021-03-10 19:18:48.275651] I [monitor(monitor):228:monitor]
>     Monitor: worker died in startup phase
>     [{brick=/data/storage_c/storage}]
>     [2021-03-10 19:18:48.278064] I [monitor(monitor):228:monitor]
>     Monitor: worker died in startup phase
>     [{brick=/data/storage_a/storage}]
>     [2021-03-10 19:18:48.280453] I
>     [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
>     Status Change [{status=Faulty}]
>     [2021-03-10 19:18:48.282274] I
>     [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
>     Status Change [{status=Faulty}]
>     [2021-03-10 19:18:58.275702] I
>     [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
>     Status Change [{status=Initializing...}]
>     [2021-03-10 19:18:58.276041] I [monitor(monitor):160:monitor]
>     Monitor: starting gsyncd worker [{brick=/data/storage_b/storage},
>     {slave_node=10.0.231.82}]
>     [2021-03-10 19:18:58.296252] I
>     [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
>     Status Change [{status=Initializing...}]
>     [2021-03-10 19:18:58.296506] I [monitor(monitor):160:monitor]
>     Monitor: starting gsyncd worker [{brick=/data/storage_c/storage},
>     {slave_node=10.0.231.82}]
>     [2021-03-10 19:18:58.301290] I
>     [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
>     Status Change [{status=Initializing...}]
>     [2021-03-10 19:18:58.301521] I [monitor(monitor):160:monitor]
>     Monitor: starting gsyncd worker [{brick=/data/storage_a/storage},
>     {slave_node=10.0.231.81}]
>     [2021-03-10 19:18:58.345817] I [resource(worker
>     /data/storage_b/storage):1387:connect_remote] SSH: Initializing
>     SSH connection between master and slave...
>     [2021-03-10 19:18:58.361268] I [resource(worker
>     /data/storage_c/storage):1387:connect_remote] SSH: Initializing
>     SSH connection between master and slave...
>     [2021-03-10 19:18:58.367985] I [resource(worker
>     /data/storage_a/storage):1387:connect_remote] SSH: Initializing
>     SSH connection between master and slave...
>     [2021-03-10 19:18:59.115143] I
>     [subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor
>     Status Change [{status=Stopped}]
>
>     It seems like there is an issue selecting the changelogs - perhaps
>     similar to this issue?
>     https://github.com/gluster/glusterfs/issues/1766
>     <https://github.com/gluster/glusterfs/issues/1766>
>
>     [root at storage01 storage_10.0.231.81_pcic-backup]# cat
>     changes-data-storage_a-storage.log
>     [2021-03-10 19:18:45.284764] I [MSGID: 132028]
>     [gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog:
>     Registering brick [{brick=/data/storage_a/storage},
>     {notify_filter=1}]
>     [2021-03-10 19:18:45.285275] I [MSGID: 101190]
>     [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started
>     thread with index [{index=3}]
>     [2021-03-10 19:18:45.285269] I [MSGID: 101190]
>     [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started
>     thread with index [{index=2}]
>     [2021-03-10 19:18:45.286615] I [socket.c:929:__socket_server_bind]
>     0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 21
>     [2021-03-10 19:18:47.308607] I [MSGID: 132035]
>     [gf-history-changelog.c:837:gf_history_changelog] 0-gfchangelog:
>     Requesting historical changelogs [{start=1614666553},
>     {end=1615403927}]
>     [2021-03-10 19:18:47.308659] I [MSGID: 132019]
>     [gf-history-changelog.c:755:gf_changelog_extract_min_max]
>     0-gfchangelog: changelogs min max [{min=1597342860},
>     {max=1615403927}, {total_changelogs=1250878}]
>     [2021-03-10 19:18:47.383774] E [MSGID: 132009]
>     [gf-history-changelog.c:941:gf_history_changelog] 0-gfchangelog:
>     wrong result [{for=end}, {start=1615403927}, {idx=1250877}]
>
>     [root at storage01 storage_10.0.231.81_pcic-backup]# tail -7
>     changes-data-storage_b-storage.log
>     [2021-03-10 19:18:45.263211] I [MSGID: 101190]
>     [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started
>     thread with index [{index=3}]
>     [2021-03-10 19:18:45.263151] I [MSGID: 132028]
>     [gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog:
>     Registering brick [{brick=/data/storage_b/storage},
>     {notify_filter=1}]
>     [2021-03-10 19:18:45.263294] I [MSGID: 101190]
>     [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started
>     thread with index [{index=2}]
>     [2021-03-10 19:18:45.264598] I [socket.c:929:__socket_server_bind]
>     0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 23
>     [2021-03-10 19:18:47.281499] I [MSGID: 132035]
>     [gf-history-changelog.c:837:gf_history_changelog] 0-gfchangelog:
>     Requesting historical changelogs [{start=1614666552},
>     {end=1615403927}]
>     [2021-03-10 19:18:47.281551] I [MSGID: 132019]
>     [gf-history-changelog.c:755:gf_changelog_extract_min_max]
>     0-gfchangelog: changelogs min max [{min=1597342860},
>     {max=1615403927}, {total_changelogs=1258258}]
>     [2021-03-10 19:18:47.357244] E [MSGID: 132009]
>     [gf-history-changelog.c:941:gf_history_changelog] 0-gfchangelog:
>     wrong result [{for=end}, {start=1615403927}, {idx=1258257}]
>
>     Any ideas on where to debug this? I'd prefer not to have to remove
>     and re-sync everything as there is about 240TB on the cluster...
>
>     Thanks,
>     ?-Matthew
>     ________
>
>
>
>     Community Meeting Calendar:
>
>     Schedule -
>     Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>     Bridge: https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210310/357d5559/attachment.html>

Strahil Nikolov

2021-Mar-11 17:37 UTC

head link

[Gluster-users] GeoRep Faulty after Gluster 7 to 8 upgrade - gfchangelog: wrong result

I think you have to increase the debug logs for geo-rep session.
I will try to find the command necessary to increase it.


Best Regards,
Strahil Nikolov






? ?????????, 11 ???? 2021 ?., 00:38:41 ?. ???????+2, Matthew Benstead
<matthewb at uvic.ca> ??????:





Thanks Strahil, 

Right - I had come across your message in early January that v8 from the CentOS
Sig was missing the SELinux rules, and had put SELinux into permissive mode
after the upgrade when I saw denied messages in the audit logs.

[root at storage01 ~]# sestatus | egrep "^SELinux status|[mM]ode"
SELinux status:???????????????? enabled
Current mode:?????????????????? permissive
Mode from config file:????????? enforcing

Yes - I am using an unprivileged user for georep:? 

[root at pcic-backup01 ~]# gluster-mountbroker status
+-------------+-------------+---------------------------+--------------+--------------------------+
|???? NODE??? | NODE STATUS |???????? MOUNT ROOT??????? |??? GROUP????
|????????? USERS?????????? |
+-------------+-------------+---------------------------+--------------+--------------------------+
| 10.0.231.82 |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) |
geoaccount(pcic-backup)? |
|? localhost? |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) |
geoaccount(pcic-backup)? |
+-------------+-------------+---------------------------+--------------+--------------------------+

[root at pcic-backup02 ~]# gluster-mountbroker status
+-------------+-------------+---------------------------+--------------+--------------------------+
|???? NODE??? | NODE STATUS |???????? MOUNT ROOT??????? |??? GROUP????
|????????? USERS?????????? |
+-------------+-------------+---------------------------+--------------+--------------------------+
| 10.0.231.81 |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) |
geoaccount(pcic-backup)? |
|? localhost? |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) |
geoaccount(pcic-backup)? |
+-------------+-------------+---------------------------+--------------+--------------------------+

Thanks,
?-Matthew


--
Matthew Benstead
System AdministratorPacific Climate Impacts ConsortiumUniversity of Victoria,
UH1PO Box 1800, STN CSCVictoria, BC, V8W 2Y2Phone: +1-250-721-8432Email:
matthewb at uvic.ca


On 3/10/21 2:11 PM, Strahil Nikolov wrote:

>??
>??Notice: This message was sent from outside the University of Victoria
email system. Please be cautious with links and sensitive information.
> 
> 
> I have tested georep on v8.3 and it was running quite well untill you
involve SELINUX.
> 
> 
> 
> Are you using SELINUX ?
> 
> Are you using unprivileged user for the georep ?
> 
> 
> 
> 
> Also, you can
check?https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/sect-troubleshooting_geo-replication
.
> 
> 
> 
> 
> Best Regards,
> 
> Strahil Nikolov
> 
> 
>>??
>>??
>> On Thu, Mar 11, 2021 at 0:03, Matthew Benstead
>> 
>> <matthewb at uvic.ca> wrote:
>> 
>> 
>>??
>>??
>> Hello, 
>> 
>> I recently upgraded my Distributed-Replicate cluster from Gluster 7.9
to 8.3 on CentOS7 using the CentOS Storage SIG packages. I had geo-replication
syncing properly before the upgrade, but not it is not working after.
>> 
>> After I had upgraded both master and slave clusters I attempted to
start geo-replication again, but it goes to faulty quickly:
>> 
>> [root at storage01 ~]# gluster volume geo-replication
storage??geoaccount at 10.0.231.81::pcic-backup start
>> Starting geo-replication session between storage &??geoaccount at
10.0.231.81::pcic-backup has been successful\
>> ????? 
>> [root at storage01 ~]# gluster volume geo-replication status 
>> ?
>> MASTER NODE??? MASTER VOL??? MASTER BRICK?????????????? SLAVE USER???
SLAVE??????????????????????????????????????? SLAVE NODE??? STATUS??? CRAWL
STATUS??? LAST_SYNCED?????????
>>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> 10.0.231.91??? storage?????? /data/storage_a/storage???
geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????
Faulty??? N/A???????????? N/A?????????????????
>> 10.0.231.91??? storage?????? /data/storage_c/storage???
geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????
Faulty??? N/A???????????? N/A?????????????????
>> 10.0.231.91??? storage?????? /data/storage_b/storage???
geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????
Faulty??? N/A???????????? N/A?????????????????
>> 10.0.231.92??? storage?????? /data/storage_b/storage???
geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????
Faulty??? N/A???????????? N/A?????????????????
>> 10.0.231.92??? storage?????? /data/storage_a/storage???
geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????
Faulty??? N/A???????????? N/A?????????????????
>> 10.0.231.92??? storage?????? /data/storage_c/storage???
geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????
Faulty??? N/A???????????? N/A?????????????????
>> 10.0.231.93??? storage?????? /data/storage_c/storage???
geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????
Faulty??? N/A???????????? N/A?????????????????
>> 10.0.231.93??? storage?????? /data/storage_b/storage???
geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????
Faulty??? N/A???????????? N/A?????????????????
>> 10.0.231.93??? storage?????? /data/storage_a/storage???
geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????
Faulty??? N/A???????????? N/A?????????????????
>> 
>> [root at storage01 ~]# gluster volume geo-replication
storage??geoaccount at 10.0.231.81::pcic-backup stop
>> Stopping geo-replication session between storage &??geoaccount at
10.0.231.81::pcic-backup has been successful
>> 
>> 
>> I went through the gsyncd logs and see it attempts to go back through
the changlogs - which would make sense - but fails:
>> 
>> [2021-03-10 19:18:42.165807] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}]
>> [2021-03-10 19:18:42.166136] I [monitor(monitor):160:monitor] Monitor:
starting gsyncd worker [{brick=/data/storage_a/storage},
{slave_node=10.0.231.81}]
>> [2021-03-10 19:18:42.167829] I [monitor(monitor):160:monitor] Monitor:
starting gsyncd worker [{brick=/data/storage_c/storage},
{slave_node=10.0.231.82}]
>> [2021-03-10 19:18:42.172343] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}]
>> [2021-03-10 19:18:42.172580] I [monitor(monitor):160:monitor] Monitor:
starting gsyncd worker [{brick=/data/storage_b/storage},
{slave_node=10.0.231.82}]
>> [2021-03-10 19:18:42.235574] I [resource(worker
/data/storage_c/storage):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
>> [2021-03-10 19:18:42.236613] I [resource(worker
/data/storage_a/storage):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
>> [2021-03-10 19:18:42.238614] I [resource(worker
/data/storage_b/storage):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
>> [2021-03-10 19:18:44.144856] I [resource(worker
/data/storage_b/storage):1436:connect_remote] SSH: SSH connection between master
and slave established. [{duration=1.9059}]
>> [2021-03-10 19:18:44.145065] I [resource(worker
/data/storage_b/storage):1116:connect] GLUSTER: Mounting gluster volume
locally...
>> [2021-03-10 19:18:44.162873] I [resource(worker
/data/storage_a/storage):1436:connect_remote] SSH: SSH connection between master
and slave established. [{duration=1.9259}]
>> [2021-03-10 19:18:44.163412] I [resource(worker
/data/storage_a/storage):1116:connect] GLUSTER: Mounting gluster volume
locally...
>> [2021-03-10 19:18:44.167506] I [resource(worker
/data/storage_c/storage):1436:connect_remote] SSH: SSH connection between master
and slave established. [{duration=1.9316}]
>> [2021-03-10 19:18:44.167746] I [resource(worker
/data/storage_c/storage):1116:connect] GLUSTER: Mounting gluster volume
locally...
>> [2021-03-10 19:18:45.251372] I [resource(worker
/data/storage_b/storage):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.1062}]
>> [2021-03-10 19:18:45.251583] I [subcmds(worker
/data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
>> [2021-03-10 19:18:45.271950] I [resource(worker
/data/storage_c/storage):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.1041}]
>> [2021-03-10 19:18:45.272118] I [subcmds(worker
/data/storage_c/storage):84:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
>> [2021-03-10 19:18:45.275180] I [resource(worker
/data/storage_a/storage):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.1116}]
>> [2021-03-10 19:18:45.275361] I [subcmds(worker
/data/storage_a/storage):84:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
>> [2021-03-10 19:18:47.265618] I [master(worker
/data/storage_b/storage):1645:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage}]
>> [2021-03-10 19:18:47.265954] I [resource(worker
/data/storage_b/storage):1292:service_loop] GLUSTER: Register time
[{time=1615403927}]
>> [2021-03-10 19:18:47.276746] I [gsyncdstatus(worker
/data/storage_b/storage):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
>> [2021-03-10 19:18:47.281194] I [gsyncdstatus(worker
/data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus: Crawl Status
Change [{status=History Crawl}]
>> [2021-03-10 19:18:47.281404] I [master(worker
/data/storage_b/storage):1559:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1614666552, 0)}, {entry_stime=(1614664113, 0)},
{etime=1615403927}]
>> [2021-03-10 19:18:47.285340] I [master(worker
/data/storage_c/storage):1645:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_c-storage}]
>> [2021-03-10 19:18:47.285579] I [resource(worker
/data/storage_c/storage):1292:service_loop] GLUSTER: Register time
[{time=1615403927}]
>> [2021-03-10 19:18:47.287383] I [master(worker
/data/storage_a/storage):1645:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_a-storage}]
>> [2021-03-10 19:18:47.287697] I [resource(worker
/data/storage_a/storage):1292:service_loop] GLUSTER: Register time
[{time=1615403927}]
>> [2021-03-10 19:18:47.298415] I [gsyncdstatus(worker
/data/storage_c/storage):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
>> [2021-03-10 19:18:47.301342] I [gsyncdstatus(worker
/data/storage_a/storage):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
>> [2021-03-10 19:18:47.304183] I [gsyncdstatus(worker
/data/storage_c/storage):253:set_worker_crawl_status] GeorepStatus: Crawl Status
Change [{status=History Crawl}]
>> [2021-03-10 19:18:47.304418] I [master(worker
/data/storage_c/storage):1559:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1614666552, 0)}, {entry_stime=(1614664108, 0)},
{etime=1615403927}]
>> [2021-03-10 19:18:47.305294] E [resource(worker
/data/storage_c/storage):1312:service_loop] GLUSTER: Changelog History Crawl
failed [{error=[Errno 0] Success}]
>> [2021-03-10 19:18:47.308124] I [gsyncdstatus(worker
/data/storage_a/storage):253:set_worker_crawl_status] GeorepStatus: Crawl Status
Change [{status=History Crawl}]
>> [2021-03-10 19:18:47.308509] I [master(worker
/data/storage_a/storage):1559:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1614666553, 0)}, {entry_stime=(1614664115, 0)},
{etime=1615403927}]
>> [2021-03-10 19:18:47.357470] E [resource(worker
/data/storage_b/storage):1312:service_loop] GLUSTER: Changelog History Crawl
failed [{error=[Errno 0] Success}]
>> [2021-03-10 19:18:47.383949] E [resource(worker
/data/storage_a/storage):1312:service_loop] GLUSTER: Changelog History Crawl
failed [{error=[Errno 0] Success}]
>> [2021-03-10 19:18:48.255340] I [monitor(monitor):228:monitor] Monitor:
worker died in startup phase [{brick=/data/storage_b/storage}]
>> [2021-03-10 19:18:48.260052] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Faulty}]
>> [2021-03-10 19:18:48.275651] I [monitor(monitor):228:monitor] Monitor:
worker died in startup phase [{brick=/data/storage_c/storage}]
>> [2021-03-10 19:18:48.278064] I [monitor(monitor):228:monitor] Monitor:
worker died in startup phase [{brick=/data/storage_a/storage}]
>> [2021-03-10 19:18:48.280453] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Faulty}]
>> [2021-03-10 19:18:48.282274] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Faulty}]
>> [2021-03-10 19:18:58.275702] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}]
>> [2021-03-10 19:18:58.276041] I [monitor(monitor):160:monitor] Monitor:
starting gsyncd worker [{brick=/data/storage_b/storage},
{slave_node=10.0.231.82}]
>> [2021-03-10 19:18:58.296252] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}]
>> [2021-03-10 19:18:58.296506] I [monitor(monitor):160:monitor] Monitor:
starting gsyncd worker [{brick=/data/storage_c/storage},
{slave_node=10.0.231.82}]
>> [2021-03-10 19:18:58.301290] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}]
>> [2021-03-10 19:18:58.301521] I [monitor(monitor):160:monitor] Monitor:
starting gsyncd worker [{brick=/data/storage_a/storage},
{slave_node=10.0.231.81}]
>> [2021-03-10 19:18:58.345817] I [resource(worker
/data/storage_b/storage):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
>> [2021-03-10 19:18:58.361268] I [resource(worker
/data/storage_c/storage):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
>> [2021-03-10 19:18:58.367985] I [resource(worker
/data/storage_a/storage):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
>> [2021-03-10 19:18:59.115143] I
[subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor Status
Change [{status=Stopped}]
>> 
>> It seems like there is an issue selecting the changelogs - perhaps
similar to this issue???https://github.com/gluster/glusterfs/issues/1766
>> 
>> [root at storage01 storage_10.0.231.81_pcic-backup]# cat
changes-data-storage_a-storage.log
>> [2021-03-10 19:18:45.284764] I [MSGID: 132028]
[gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog: Registering
brick [{brick=/data/storage_a/storage}, {notify_filter=1}]
>> [2021-03-10 19:18:45.285275] I [MSGID: 101190]
[event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with
index [{index=3}]
>> [2021-03-10 19:18:45.285269] I [MSGID: 101190]
[event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with
index [{index=2}]
>> [2021-03-10 19:18:45.286615] I [socket.c:929:__socket_server_bind]
0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 21
>> [2021-03-10 19:18:47.308607] I [MSGID: 132035]
[gf-history-changelog.c:837:gf_history_changelog] 0-gfchangelog: Requesting
historical changelogs [{start=1614666553}, {end=1615403927}]
>> [2021-03-10 19:18:47.308659] I [MSGID: 132019]
[gf-history-changelog.c:755:gf_changelog_extract_min_max] 0-gfchangelog:
changelogs min max [{min=1597342860}, {max=1615403927},
{total_changelogs=1250878}]
>> [2021-03-10 19:18:47.383774] E [MSGID: 132009]
[gf-history-changelog.c:941:gf_history_changelog] 0-gfchangelog: wrong result
[{for=end}, {start=1615403927}, {idx=1250877}]
>> 
>> [root at storage01 storage_10.0.231.81_pcic-backup]# tail -7
changes-data-storage_b-storage.log
>> [2021-03-10 19:18:45.263211] I [MSGID: 101190]
[event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with
index [{index=3}]
>> [2021-03-10 19:18:45.263151] I [MSGID: 132028]
[gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog: Registering
brick [{brick=/data/storage_b/storage}, {notify_filter=1}]
>> [2021-03-10 19:18:45.263294] I [MSGID: 101190]
[event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with
index [{index=2}]
>> [2021-03-10 19:18:45.264598] I [socket.c:929:__socket_server_bind]
0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 23
>> [2021-03-10 19:18:47.281499] I [MSGID: 132035]
[gf-history-changelog.c:837:gf_history_changelog] 0-gfchangelog: Requesting
historical changelogs [{start=1614666552}, {end=1615403927}]
>> [2021-03-10 19:18:47.281551] I [MSGID: 132019]
[gf-history-changelog.c:755:gf_changelog_extract_min_max] 0-gfchangelog:
changelogs min max [{min=1597342860}, {max=1615403927},
{total_changelogs=1258258}]
>> [2021-03-10 19:18:47.357244] E [MSGID: 132009]
[gf-history-changelog.c:941:gf_history_changelog] 0-gfchangelog: wrong result
[{for=end}, {start=1615403927}, {idx=1258257}]
>> 
>> Any ideas on where to debug this? I'd prefer not to have to remove
and re-sync everything as there is about 240TB on the cluster...
>> 
>> Thanks,
>> ?-Matthew
>> 
>> 
>> ________
>> 
>> 
>> 
>> Community Meeting Calendar:
>> 
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>> 
> 
> 
>

Gluster users - Mar 2021 - GeoRep Faulty after Gluster 7 to 8 upgrade - gfchangelog: wrong result

[Gluster-users] GeoRep Faulty after Gluster 7 to 8 upgrade - gfchangelog: wrong result

[Gluster-users] GeoRep Faulty after Gluster 7 to 8 upgrade - gfchangelog: wrong result