Matthew Benstead
2021-Mar-10 22:38 UTC
[Gluster-users] GeoRep Faulty after Gluster 7 to 8 upgrade - gfchangelog: wrong result
Thanks Strahil, Right - I had come across your message in early January that v8 from the CentOS Sig was missing the SELinux rules, and had put SELinux into permissive mode after the upgrade when I saw denied messages in the audit logs. [root at storage01 ~]# sestatus | egrep "^SELinux status|[mM]ode" SELinux status:???????????????? enabled Current mode:?????????????????? permissive Mode from config file:????????? enforcing Yes - I am using an unprivileged user for georep:? [root at pcic-backup01 ~]# gluster-mountbroker status +-------------+-------------+---------------------------+--------------+--------------------------+ |???? NODE??? | NODE STATUS |???????? MOUNT ROOT??????? |??? GROUP???? |????????? USERS?????????? | +-------------+-------------+---------------------------+--------------+--------------------------+ | 10.0.231.82 |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) | geoaccount(pcic-backup)? | |? localhost? |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) | geoaccount(pcic-backup)? | +-------------+-------------+---------------------------+--------------+--------------------------+ [root at pcic-backup02 ~]# gluster-mountbroker status +-------------+-------------+---------------------------+--------------+--------------------------+ |???? NODE??? | NODE STATUS |???????? MOUNT ROOT??????? |??? GROUP???? |????????? USERS?????????? | +-------------+-------------+---------------------------+--------------+--------------------------+ | 10.0.231.81 |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) | geoaccount(pcic-backup)? | |? localhost? |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) | geoaccount(pcic-backup)? | +-------------+-------------+---------------------------+--------------+--------------------------+ Thanks, ?-Matthew -- Matthew Benstead System Administrator Pacific Climate Impacts Consortium <https://pacificclimate.org/> University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: +1-250-721-8432 Email: matthewb at uvic.ca On 3/10/21 2:11 PM, Strahil Nikolov wrote:> Notice: This message was sent from outside the University of Victoria > email system. Please be cautious with links and sensitive information. > > I have tested georep on v8.3 and it was running quite well untill you > involve SELINUX. > > Are you using SELINUX ? > Are you using unprivileged user for the georep ? > > Also, you can > check?https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/sect-troubleshooting_geo-replication > . > > Best Regards, > Strahil Nikolov > > On Thu, Mar 11, 2021 at 0:03, Matthew Benstead > <matthewb at uvic.ca> wrote: > Hello, > > I recently upgraded my Distributed-Replicate cluster from Gluster > 7.9 to 8.3 on CentOS7 using the CentOS Storage SIG packages. I had > geo-replication syncing properly before the upgrade, but not it is > not working after. > > After I had upgraded both master and slave clusters I attempted to > start geo-replication again, but it goes to faulty quickly: > > [root at storage01 ~]# gluster volume geo-replication storage > geoaccount at 10.0.231.81::pcic-backup > <mailto:geoaccount at 10.0.231.81::pcic-backup> start > Starting geo-replication session between storage & > geoaccount at 10.0.231.81::pcic-backup > <mailto:geoaccount at 10.0.231.81::pcic-backup> has been successful\ > ????? > [root at storage01 ~]# gluster volume geo-replication status > ? > MASTER NODE??? MASTER VOL??? MASTER BRICK?????????????? SLAVE > USER??? SLAVE??????????????????????????????????????? SLAVE NODE??? > STATUS??? CRAWL STATUS??? LAST_SYNCED????????? > --------------------------------------------------------------------------------------------------------------------------------------------------------------------- > 10.0.231.91??? storage?????? /data/storage_a/storage??? > geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup > <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>??? > N/A?????????? Faulty??? N/A???????????? N/A????????????????? > 10.0.231.91??? storage?????? /data/storage_c/storage??? > geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup > <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>??? > N/A?????????? Faulty??? N/A???????????? N/A????????????????? > 10.0.231.91??? storage?????? /data/storage_b/storage??? > geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup > <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>??? > N/A?????????? Faulty??? N/A???????????? N/A????????????????? > 10.0.231.92??? storage?????? /data/storage_b/storage??? > geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup > <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>??? > N/A?????????? Faulty??? N/A???????????? N/A????????????????? > 10.0.231.92??? storage?????? /data/storage_a/storage??? > geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup > <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>??? > N/A?????????? Faulty??? N/A???????????? N/A????????????????? > 10.0.231.92??? storage?????? /data/storage_c/storage??? > geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup > <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>??? > N/A?????????? Faulty??? N/A???????????? N/A????????????????? > 10.0.231.93??? storage?????? /data/storage_c/storage??? > geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup > <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>??? > N/A?????????? Faulty??? N/A???????????? N/A????????????????? > 10.0.231.93??? storage?????? /data/storage_b/storage??? > geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup > <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>??? > N/A?????????? Faulty??? N/A???????????? N/A????????????????? > 10.0.231.93??? storage?????? /data/storage_a/storage??? > geoaccount??? ssh://geoaccount at 10.0.231.81::pcic-backup > <mailto:ssh://geoaccount at 10.0.231.81::pcic-backup>??? > N/A?????????? Faulty??? N/A???????????? N/A????????????????? > > [root at storage01 ~]# gluster volume geo-replication storage > geoaccount at 10.0.231.81::pcic-backup > <mailto:geoaccount at 10.0.231.81::pcic-backup> stop > Stopping geo-replication session between storage & > geoaccount at 10.0.231.81::pcic-backup > <mailto:geoaccount at 10.0.231.81::pcic-backup> has been successful > > > I went through the gsyncd logs and see it attempts to go back > through the changlogs - which would make sense - but fails: > > [2021-03-10 19:18:42.165807] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker > Status Change [{status=Initializing...}] > [2021-03-10 19:18:42.166136] I [monitor(monitor):160:monitor] > Monitor: starting gsyncd worker [{brick=/data/storage_a/storage}, > {slave_node=10.0.231.81}] > [2021-03-10 19:18:42.167829] I [monitor(monitor):160:monitor] > Monitor: starting gsyncd worker [{brick=/data/storage_c/storage}, > {slave_node=10.0.231.82}] > [2021-03-10 19:18:42.172343] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker > Status Change [{status=Initializing...}] > [2021-03-10 19:18:42.172580] I [monitor(monitor):160:monitor] > Monitor: starting gsyncd worker [{brick=/data/storage_b/storage}, > {slave_node=10.0.231.82}] > [2021-03-10 19:18:42.235574] I [resource(worker > /data/storage_c/storage):1387:connect_remote] SSH: Initializing > SSH connection between master and slave... > [2021-03-10 19:18:42.236613] I [resource(worker > /data/storage_a/storage):1387:connect_remote] SSH: Initializing > SSH connection between master and slave... > [2021-03-10 19:18:42.238614] I [resource(worker > /data/storage_b/storage):1387:connect_remote] SSH: Initializing > SSH connection between master and slave... > [2021-03-10 19:18:44.144856] I [resource(worker > /data/storage_b/storage):1436:connect_remote] SSH: SSH connection > between master and slave established. [{duration=1.9059}] > [2021-03-10 19:18:44.145065] I [resource(worker > /data/storage_b/storage):1116:connect] GLUSTER: Mounting gluster > volume locally... > [2021-03-10 19:18:44.162873] I [resource(worker > /data/storage_a/storage):1436:connect_remote] SSH: SSH connection > between master and slave established. [{duration=1.9259}] > [2021-03-10 19:18:44.163412] I [resource(worker > /data/storage_a/storage):1116:connect] GLUSTER: Mounting gluster > volume locally... > [2021-03-10 19:18:44.167506] I [resource(worker > /data/storage_c/storage):1436:connect_remote] SSH: SSH connection > between master and slave established. [{duration=1.9316}] > [2021-03-10 19:18:44.167746] I [resource(worker > /data/storage_c/storage):1116:connect] GLUSTER: Mounting gluster > volume locally... > [2021-03-10 19:18:45.251372] I [resource(worker > /data/storage_b/storage):1139:connect] GLUSTER: Mounted gluster > volume [{duration=1.1062}] > [2021-03-10 19:18:45.251583] I [subcmds(worker > /data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn > successful. Acknowledging back to monitor > [2021-03-10 19:18:45.271950] I [resource(worker > /data/storage_c/storage):1139:connect] GLUSTER: Mounted gluster > volume [{duration=1.1041}] > [2021-03-10 19:18:45.272118] I [subcmds(worker > /data/storage_c/storage):84:subcmd_worker] <top>: Worker spawn > successful. Acknowledging back to monitor > [2021-03-10 19:18:45.275180] I [resource(worker > /data/storage_a/storage):1139:connect] GLUSTER: Mounted gluster > volume [{duration=1.1116}] > [2021-03-10 19:18:45.275361] I [subcmds(worker > /data/storage_a/storage):84:subcmd_worker] <top>: Worker spawn > successful. Acknowledging back to monitor > [2021-03-10 19:18:47.265618] I [master(worker > /data/storage_b/storage):1645:register] _GMaster: Working dir > [{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage}] > [2021-03-10 19:18:47.265954] I [resource(worker > /data/storage_b/storage):1292:service_loop] GLUSTER: Register time > [{time=1615403927}] > [2021-03-10 19:18:47.276746] I [gsyncdstatus(worker > /data/storage_b/storage):281:set_active] GeorepStatus: Worker > Status Change [{status=Active}] > [2021-03-10 19:18:47.281194] I [gsyncdstatus(worker > /data/storage_b/storage):253:set_worker_crawl_status] > GeorepStatus: Crawl Status Change [{status=History Crawl}] > [2021-03-10 19:18:47.281404] I [master(worker > /data/storage_b/storage):1559:crawl] _GMaster: starting history > crawl [{turns=1}, {stime=(1614666552, 0)}, > {entry_stime=(1614664113, 0)}, {etime=1615403927}] > [2021-03-10 19:18:47.285340] I [master(worker > /data/storage_c/storage):1645:register] _GMaster: Working dir > [{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_c-storage}] > [2021-03-10 19:18:47.285579] I [resource(worker > /data/storage_c/storage):1292:service_loop] GLUSTER: Register time > [{time=1615403927}] > [2021-03-10 19:18:47.287383] I [master(worker > /data/storage_a/storage):1645:register] _GMaster: Working dir > [{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_a-storage}] > [2021-03-10 19:18:47.287697] I [resource(worker > /data/storage_a/storage):1292:service_loop] GLUSTER: Register time > [{time=1615403927}] > [2021-03-10 19:18:47.298415] I [gsyncdstatus(worker > /data/storage_c/storage):281:set_active] GeorepStatus: Worker > Status Change [{status=Active}] > [2021-03-10 19:18:47.301342] I [gsyncdstatus(worker > /data/storage_a/storage):281:set_active] GeorepStatus: Worker > Status Change [{status=Active}] > [2021-03-10 19:18:47.304183] I [gsyncdstatus(worker > /data/storage_c/storage):253:set_worker_crawl_status] > GeorepStatus: Crawl Status Change [{status=History Crawl}] > [2021-03-10 19:18:47.304418] I [master(worker > /data/storage_c/storage):1559:crawl] _GMaster: starting history > crawl [{turns=1}, {stime=(1614666552, 0)}, > {entry_stime=(1614664108, 0)}, {etime=1615403927}] > [2021-03-10 19:18:47.305294] E [resource(worker > /data/storage_c/storage):1312:service_loop] GLUSTER: Changelog > History Crawl failed [{error=[Errno 0] Success}] > [2021-03-10 19:18:47.308124] I [gsyncdstatus(worker > /data/storage_a/storage):253:set_worker_crawl_status] > GeorepStatus: Crawl Status Change [{status=History Crawl}] > [2021-03-10 19:18:47.308509] I [master(worker > /data/storage_a/storage):1559:crawl] _GMaster: starting history > crawl [{turns=1}, {stime=(1614666553, 0)}, > {entry_stime=(1614664115, 0)}, {etime=1615403927}] > [2021-03-10 19:18:47.357470] E [resource(worker > /data/storage_b/storage):1312:service_loop] GLUSTER: Changelog > History Crawl failed [{error=[Errno 0] Success}] > [2021-03-10 19:18:47.383949] E [resource(worker > /data/storage_a/storage):1312:service_loop] GLUSTER: Changelog > History Crawl failed [{error=[Errno 0] Success}] > [2021-03-10 19:18:48.255340] I [monitor(monitor):228:monitor] > Monitor: worker died in startup phase > [{brick=/data/storage_b/storage}] > [2021-03-10 19:18:48.260052] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker > Status Change [{status=Faulty}] > [2021-03-10 19:18:48.275651] I [monitor(monitor):228:monitor] > Monitor: worker died in startup phase > [{brick=/data/storage_c/storage}] > [2021-03-10 19:18:48.278064] I [monitor(monitor):228:monitor] > Monitor: worker died in startup phase > [{brick=/data/storage_a/storage}] > [2021-03-10 19:18:48.280453] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker > Status Change [{status=Faulty}] > [2021-03-10 19:18:48.282274] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker > Status Change [{status=Faulty}] > [2021-03-10 19:18:58.275702] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker > Status Change [{status=Initializing...}] > [2021-03-10 19:18:58.276041] I [monitor(monitor):160:monitor] > Monitor: starting gsyncd worker [{brick=/data/storage_b/storage}, > {slave_node=10.0.231.82}] > [2021-03-10 19:18:58.296252] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker > Status Change [{status=Initializing...}] > [2021-03-10 19:18:58.296506] I [monitor(monitor):160:monitor] > Monitor: starting gsyncd worker [{brick=/data/storage_c/storage}, > {slave_node=10.0.231.82}] > [2021-03-10 19:18:58.301290] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker > Status Change [{status=Initializing...}] > [2021-03-10 19:18:58.301521] I [monitor(monitor):160:monitor] > Monitor: starting gsyncd worker [{brick=/data/storage_a/storage}, > {slave_node=10.0.231.81}] > [2021-03-10 19:18:58.345817] I [resource(worker > /data/storage_b/storage):1387:connect_remote] SSH: Initializing > SSH connection between master and slave... > [2021-03-10 19:18:58.361268] I [resource(worker > /data/storage_c/storage):1387:connect_remote] SSH: Initializing > SSH connection between master and slave... > [2021-03-10 19:18:58.367985] I [resource(worker > /data/storage_a/storage):1387:connect_remote] SSH: Initializing > SSH connection between master and slave... > [2021-03-10 19:18:59.115143] I > [subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor > Status Change [{status=Stopped}] > > It seems like there is an issue selecting the changelogs - perhaps > similar to this issue? > https://github.com/gluster/glusterfs/issues/1766 > <https://github.com/gluster/glusterfs/issues/1766> > > [root at storage01 storage_10.0.231.81_pcic-backup]# cat > changes-data-storage_a-storage.log > [2021-03-10 19:18:45.284764] I [MSGID: 132028] > [gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog: > Registering brick [{brick=/data/storage_a/storage}, > {notify_filter=1}] > [2021-03-10 19:18:45.285275] I [MSGID: 101190] > [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started > thread with index [{index=3}] > [2021-03-10 19:18:45.285269] I [MSGID: 101190] > [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started > thread with index [{index=2}] > [2021-03-10 19:18:45.286615] I [socket.c:929:__socket_server_bind] > 0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 21 > [2021-03-10 19:18:47.308607] I [MSGID: 132035] > [gf-history-changelog.c:837:gf_history_changelog] 0-gfchangelog: > Requesting historical changelogs [{start=1614666553}, > {end=1615403927}] > [2021-03-10 19:18:47.308659] I [MSGID: 132019] > [gf-history-changelog.c:755:gf_changelog_extract_min_max] > 0-gfchangelog: changelogs min max [{min=1597342860}, > {max=1615403927}, {total_changelogs=1250878}] > [2021-03-10 19:18:47.383774] E [MSGID: 132009] > [gf-history-changelog.c:941:gf_history_changelog] 0-gfchangelog: > wrong result [{for=end}, {start=1615403927}, {idx=1250877}] > > [root at storage01 storage_10.0.231.81_pcic-backup]# tail -7 > changes-data-storage_b-storage.log > [2021-03-10 19:18:45.263211] I [MSGID: 101190] > [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started > thread with index [{index=3}] > [2021-03-10 19:18:45.263151] I [MSGID: 132028] > [gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog: > Registering brick [{brick=/data/storage_b/storage}, > {notify_filter=1}] > [2021-03-10 19:18:45.263294] I [MSGID: 101190] > [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started > thread with index [{index=2}] > [2021-03-10 19:18:45.264598] I [socket.c:929:__socket_server_bind] > 0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 23 > [2021-03-10 19:18:47.281499] I [MSGID: 132035] > [gf-history-changelog.c:837:gf_history_changelog] 0-gfchangelog: > Requesting historical changelogs [{start=1614666552}, > {end=1615403927}] > [2021-03-10 19:18:47.281551] I [MSGID: 132019] > [gf-history-changelog.c:755:gf_changelog_extract_min_max] > 0-gfchangelog: changelogs min max [{min=1597342860}, > {max=1615403927}, {total_changelogs=1258258}] > [2021-03-10 19:18:47.357244] E [MSGID: 132009] > [gf-history-changelog.c:941:gf_history_changelog] 0-gfchangelog: > wrong result [{for=end}, {start=1615403927}, {idx=1258257}] > > Any ideas on where to debug this? I'd prefer not to have to remove > and re-sync everything as there is about 240TB on the cluster... > > Thanks, > ?-Matthew > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > <https://meet.google.com/cpu-eiue-hvk> > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > <https://lists.gluster.org/mailman/listinfo/gluster-users> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210310/357d5559/attachment.html>
Strahil Nikolov
2021-Mar-11 17:37 UTC
[Gluster-users] GeoRep Faulty after Gluster 7 to 8 upgrade - gfchangelog: wrong result
I think you have to increase the debug logs for geo-rep session. I will try to find the command necessary to increase it. Best Regards, Strahil Nikolov ? ?????????, 11 ???? 2021 ?., 00:38:41 ?. ???????+2, Matthew Benstead <matthewb at uvic.ca> ??????: Thanks Strahil, Right - I had come across your message in early January that v8 from the CentOS Sig was missing the SELinux rules, and had put SELinux into permissive mode after the upgrade when I saw denied messages in the audit logs. [root at storage01 ~]# sestatus | egrep "^SELinux status|[mM]ode" SELinux status:???????????????? enabled Current mode:?????????????????? permissive Mode from config file:????????? enforcing Yes - I am using an unprivileged user for georep:? [root at pcic-backup01 ~]# gluster-mountbroker status +-------------+-------------+---------------------------+--------------+--------------------------+ |???? NODE??? | NODE STATUS |???????? MOUNT ROOT??????? |??? GROUP???? |????????? USERS?????????? | +-------------+-------------+---------------------------+--------------+--------------------------+ | 10.0.231.82 |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) | geoaccount(pcic-backup)? | |? localhost? |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) | geoaccount(pcic-backup)? | +-------------+-------------+---------------------------+--------------+--------------------------+ [root at pcic-backup02 ~]# gluster-mountbroker status +-------------+-------------+---------------------------+--------------+--------------------------+ |???? NODE??? | NODE STATUS |???????? MOUNT ROOT??????? |??? GROUP???? |????????? USERS?????????? | +-------------+-------------+---------------------------+--------------+--------------------------+ | 10.0.231.81 |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) | geoaccount(pcic-backup)? | |? localhost? |????????? UP | /var/mountbroker-root(OK) | geogroup(OK) | geoaccount(pcic-backup)? | +-------------+-------------+---------------------------+--------------+--------------------------+ Thanks, ?-Matthew -- Matthew Benstead System AdministratorPacific Climate Impacts ConsortiumUniversity of Victoria, UH1PO Box 1800, STN CSCVictoria, BC, V8W 2Y2Phone: +1-250-721-8432Email: matthewb at uvic.ca On 3/10/21 2:11 PM, Strahil Nikolov wrote:>?? >??Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information. > > > I have tested georep on v8.3 and it was running quite well untill you involve SELINUX. > > > > Are you using SELINUX ? > > Are you using unprivileged user for the georep ? > > > > > Also, you can check?https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/sect-troubleshooting_geo-replication . > > > > > Best Regards, > > Strahil Nikolov > > >>?? >>?? >> On Thu, Mar 11, 2021 at 0:03, Matthew Benstead >> >> <matthewb at uvic.ca> wrote: >> >> >>?? >>?? >> Hello, >> >> I recently upgraded my Distributed-Replicate cluster from Gluster 7.9 to 8.3 on CentOS7 using the CentOS Storage SIG packages. I had geo-replication syncing properly before the upgrade, but not it is not working after. >> >> After I had upgraded both master and slave clusters I attempted to start geo-replication again, but it goes to faulty quickly: >> >> [root at storage01 ~]# gluster volume geo-replication storage??geoaccount at 10.0.231.81::pcic-backup start >> Starting geo-replication session between storage &??geoaccount at 10.0.231.81::pcic-backup has been successful\ >> ????? >> [root at storage01 ~]# gluster volume geo-replication status >> ? >> MASTER NODE??? MASTER VOL??? MASTER BRICK?????????????? SLAVE USER??? SLAVE??????????????????????????????????????? SLAVE NODE??? STATUS??? CRAWL STATUS??? LAST_SYNCED????????? >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> 10.0.231.91??? storage?????? /data/storage_a/storage??? geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A?????????? Faulty??? N/A???????????? N/A????????????????? >> 10.0.231.91??? storage?????? /data/storage_c/storage??? geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A?????????? Faulty??? N/A???????????? N/A????????????????? >> 10.0.231.91??? storage?????? /data/storage_b/storage??? geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A?????????? Faulty??? N/A???????????? N/A????????????????? >> 10.0.231.92??? storage?????? /data/storage_b/storage??? geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A?????????? Faulty??? N/A???????????? N/A????????????????? >> 10.0.231.92??? storage?????? /data/storage_a/storage??? geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A?????????? Faulty??? N/A???????????? N/A????????????????? >> 10.0.231.92??? storage?????? /data/storage_c/storage??? geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A?????????? Faulty??? N/A???????????? N/A????????????????? >> 10.0.231.93??? storage?????? /data/storage_c/storage??? geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A?????????? Faulty??? N/A???????????? N/A????????????????? >> 10.0.231.93??? storage?????? /data/storage_b/storage??? geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A?????????? Faulty??? N/A???????????? N/A????????????????? >> 10.0.231.93??? storage?????? /data/storage_a/storage??? geoaccount?????ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A?????????? Faulty??? N/A???????????? N/A????????????????? >> >> [root at storage01 ~]# gluster volume geo-replication storage??geoaccount at 10.0.231.81::pcic-backup stop >> Stopping geo-replication session between storage &??geoaccount at 10.0.231.81::pcic-backup has been successful >> >> >> I went through the gsyncd logs and see it attempts to go back through the changlogs - which would make sense - but fails: >> >> [2021-03-10 19:18:42.165807] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] >> [2021-03-10 19:18:42.166136] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/data/storage_a/storage}, {slave_node=10.0.231.81}] >> [2021-03-10 19:18:42.167829] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/data/storage_c/storage}, {slave_node=10.0.231.82}] >> [2021-03-10 19:18:42.172343] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] >> [2021-03-10 19:18:42.172580] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/data/storage_b/storage}, {slave_node=10.0.231.82}] >> [2021-03-10 19:18:42.235574] I [resource(worker /data/storage_c/storage):1387:connect_remote] SSH: Initializing SSH connection between master and slave... >> [2021-03-10 19:18:42.236613] I [resource(worker /data/storage_a/storage):1387:connect_remote] SSH: Initializing SSH connection between master and slave... >> [2021-03-10 19:18:42.238614] I [resource(worker /data/storage_b/storage):1387:connect_remote] SSH: Initializing SSH connection between master and slave... >> [2021-03-10 19:18:44.144856] I [resource(worker /data/storage_b/storage):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.9059}] >> [2021-03-10 19:18:44.145065] I [resource(worker /data/storage_b/storage):1116:connect] GLUSTER: Mounting gluster volume locally... >> [2021-03-10 19:18:44.162873] I [resource(worker /data/storage_a/storage):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.9259}] >> [2021-03-10 19:18:44.163412] I [resource(worker /data/storage_a/storage):1116:connect] GLUSTER: Mounting gluster volume locally... >> [2021-03-10 19:18:44.167506] I [resource(worker /data/storage_c/storage):1436:connect_remote] SSH: SSH connection between master and slave established. [{duration=1.9316}] >> [2021-03-10 19:18:44.167746] I [resource(worker /data/storage_c/storage):1116:connect] GLUSTER: Mounting gluster volume locally... >> [2021-03-10 19:18:45.251372] I [resource(worker /data/storage_b/storage):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1062}] >> [2021-03-10 19:18:45.251583] I [subcmds(worker /data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor >> [2021-03-10 19:18:45.271950] I [resource(worker /data/storage_c/storage):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1041}] >> [2021-03-10 19:18:45.272118] I [subcmds(worker /data/storage_c/storage):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor >> [2021-03-10 19:18:45.275180] I [resource(worker /data/storage_a/storage):1139:connect] GLUSTER: Mounted gluster volume [{duration=1.1116}] >> [2021-03-10 19:18:45.275361] I [subcmds(worker /data/storage_a/storage):84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor >> [2021-03-10 19:18:47.265618] I [master(worker /data/storage_b/storage):1645:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage}] >> [2021-03-10 19:18:47.265954] I [resource(worker /data/storage_b/storage):1292:service_loop] GLUSTER: Register time [{time=1615403927}] >> [2021-03-10 19:18:47.276746] I [gsyncdstatus(worker /data/storage_b/storage):281:set_active] GeorepStatus: Worker Status Change [{status=Active}] >> [2021-03-10 19:18:47.281194] I [gsyncdstatus(worker /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=History Crawl}] >> [2021-03-10 19:18:47.281404] I [master(worker /data/storage_b/storage):1559:crawl] _GMaster: starting history crawl [{turns=1}, {stime=(1614666552, 0)}, {entry_stime=(1614664113, 0)}, {etime=1615403927}] >> [2021-03-10 19:18:47.285340] I [master(worker /data/storage_c/storage):1645:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_c-storage}] >> [2021-03-10 19:18:47.285579] I [resource(worker /data/storage_c/storage):1292:service_loop] GLUSTER: Register time [{time=1615403927}] >> [2021-03-10 19:18:47.287383] I [master(worker /data/storage_a/storage):1645:register] _GMaster: Working dir [{path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_a-storage}] >> [2021-03-10 19:18:47.287697] I [resource(worker /data/storage_a/storage):1292:service_loop] GLUSTER: Register time [{time=1615403927}] >> [2021-03-10 19:18:47.298415] I [gsyncdstatus(worker /data/storage_c/storage):281:set_active] GeorepStatus: Worker Status Change [{status=Active}] >> [2021-03-10 19:18:47.301342] I [gsyncdstatus(worker /data/storage_a/storage):281:set_active] GeorepStatus: Worker Status Change [{status=Active}] >> [2021-03-10 19:18:47.304183] I [gsyncdstatus(worker /data/storage_c/storage):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=History Crawl}] >> [2021-03-10 19:18:47.304418] I [master(worker /data/storage_c/storage):1559:crawl] _GMaster: starting history crawl [{turns=1}, {stime=(1614666552, 0)}, {entry_stime=(1614664108, 0)}, {etime=1615403927}] >> [2021-03-10 19:18:47.305294] E [resource(worker /data/storage_c/storage):1312:service_loop] GLUSTER: Changelog History Crawl failed [{error=[Errno 0] Success}] >> [2021-03-10 19:18:47.308124] I [gsyncdstatus(worker /data/storage_a/storage):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change [{status=History Crawl}] >> [2021-03-10 19:18:47.308509] I [master(worker /data/storage_a/storage):1559:crawl] _GMaster: starting history crawl [{turns=1}, {stime=(1614666553, 0)}, {entry_stime=(1614664115, 0)}, {etime=1615403927}] >> [2021-03-10 19:18:47.357470] E [resource(worker /data/storage_b/storage):1312:service_loop] GLUSTER: Changelog History Crawl failed [{error=[Errno 0] Success}] >> [2021-03-10 19:18:47.383949] E [resource(worker /data/storage_a/storage):1312:service_loop] GLUSTER: Changelog History Crawl failed [{error=[Errno 0] Success}] >> [2021-03-10 19:18:48.255340] I [monitor(monitor):228:monitor] Monitor: worker died in startup phase [{brick=/data/storage_b/storage}] >> [2021-03-10 19:18:48.260052] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}] >> [2021-03-10 19:18:48.275651] I [monitor(monitor):228:monitor] Monitor: worker died in startup phase [{brick=/data/storage_c/storage}] >> [2021-03-10 19:18:48.278064] I [monitor(monitor):228:monitor] Monitor: worker died in startup phase [{brick=/data/storage_a/storage}] >> [2021-03-10 19:18:48.280453] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}] >> [2021-03-10 19:18:48.282274] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}] >> [2021-03-10 19:18:58.275702] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] >> [2021-03-10 19:18:58.276041] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/data/storage_b/storage}, {slave_node=10.0.231.82}] >> [2021-03-10 19:18:58.296252] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] >> [2021-03-10 19:18:58.296506] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/data/storage_c/storage}, {slave_node=10.0.231.82}] >> [2021-03-10 19:18:58.301290] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}] >> [2021-03-10 19:18:58.301521] I [monitor(monitor):160:monitor] Monitor: starting gsyncd worker [{brick=/data/storage_a/storage}, {slave_node=10.0.231.81}] >> [2021-03-10 19:18:58.345817] I [resource(worker /data/storage_b/storage):1387:connect_remote] SSH: Initializing SSH connection between master and slave... >> [2021-03-10 19:18:58.361268] I [resource(worker /data/storage_c/storage):1387:connect_remote] SSH: Initializing SSH connection between master and slave... >> [2021-03-10 19:18:58.367985] I [resource(worker /data/storage_a/storage):1387:connect_remote] SSH: Initializing SSH connection between master and slave... >> [2021-03-10 19:18:59.115143] I [subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor Status Change [{status=Stopped}] >> >> It seems like there is an issue selecting the changelogs - perhaps similar to this issue???https://github.com/gluster/glusterfs/issues/1766 >> >> [root at storage01 storage_10.0.231.81_pcic-backup]# cat changes-data-storage_a-storage.log >> [2021-03-10 19:18:45.284764] I [MSGID: 132028] [gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog: Registering brick [{brick=/data/storage_a/storage}, {notify_filter=1}] >> [2021-03-10 19:18:45.285275] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=3}] >> [2021-03-10 19:18:45.285269] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=2}] >> [2021-03-10 19:18:45.286615] I [socket.c:929:__socket_server_bind] 0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 21 >> [2021-03-10 19:18:47.308607] I [MSGID: 132035] [gf-history-changelog.c:837:gf_history_changelog] 0-gfchangelog: Requesting historical changelogs [{start=1614666553}, {end=1615403927}] >> [2021-03-10 19:18:47.308659] I [MSGID: 132019] [gf-history-changelog.c:755:gf_changelog_extract_min_max] 0-gfchangelog: changelogs min max [{min=1597342860}, {max=1615403927}, {total_changelogs=1250878}] >> [2021-03-10 19:18:47.383774] E [MSGID: 132009] [gf-history-changelog.c:941:gf_history_changelog] 0-gfchangelog: wrong result [{for=end}, {start=1615403927}, {idx=1250877}] >> >> [root at storage01 storage_10.0.231.81_pcic-backup]# tail -7 changes-data-storage_b-storage.log >> [2021-03-10 19:18:45.263211] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=3}] >> [2021-03-10 19:18:45.263151] I [MSGID: 132028] [gf-changelog.c:577:gf_changelog_register_generic] 0-gfchangelog: Registering brick [{brick=/data/storage_b/storage}, {notify_filter=1}] >> [2021-03-10 19:18:45.263294] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=2}] >> [2021-03-10 19:18:45.264598] I [socket.c:929:__socket_server_bind] 0-socket.gfchangelog: closing (AF_UNIX) reuse check socket 23 >> [2021-03-10 19:18:47.281499] I [MSGID: 132035] [gf-history-changelog.c:837:gf_history_changelog] 0-gfchangelog: Requesting historical changelogs [{start=1614666552}, {end=1615403927}] >> [2021-03-10 19:18:47.281551] I [MSGID: 132019] [gf-history-changelog.c:755:gf_changelog_extract_min_max] 0-gfchangelog: changelogs min max [{min=1597342860}, {max=1615403927}, {total_changelogs=1258258}] >> [2021-03-10 19:18:47.357244] E [MSGID: 132009] [gf-history-changelog.c:941:gf_history_changelog] 0-gfchangelog: wrong result [{for=end}, {start=1615403927}, {idx=1258257}] >> >> Any ideas on where to debug this? I'd prefer not to have to remove and re-sync everything as there is about 240TB on the cluster... >> >> Thanks, >> ?-Matthew >> >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://meet.google.com/cpu-eiue-hvk >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > >