Strahil Nikolov
2024-Jan-28  22:07 UTC
[Gluster-users] Geo-replication status is getting Faulty after few seconds
Gluster doesn't use the ssh key in /root/.ssh, thus you need to exchange the
public key that corresponds to?/var/lib/glusterd/geo-replication/secret.pem . If
you don't know the pub key, google how to obtain it from the private key.
Ensure that all hosts can ssh to the secondary before proceeding with the
troubleshooting.
Best Regards,Strahil Nikolov
 
 
  On Sun, Jan 28, 2024 at 15:58, Anant Saraswat<anant.saraswat at
techblue.co.uk> wrote:   Hi All,
I have now copied?/var/lib/glusterd/geo-replication/secret.pem.pub?(public key)
from master3 to drtier1data /root/.ssh/authorized_keys, and now I can ssh from
master node3 to drtier1data using the georep key
(/var/lib/glusterd/geo-replication/secret.pem).
But I am still getting the same error, and geo-replication is getting faulty
again and again.
[2024-01-28 13:46:38.897683] I [resource(worker
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time
[{time=1706449598}][2024-01-28 13:46:38.922491] I [gsyncdstatus(worker
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}][2024-01-28 13:46:38.923127] I [gsyncdstatus(worker
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=History Crawl}][2024-01-28 13:46:38.923313] I
[master(worker /opt/tier1data2019/brick):1576:crawl] _GMaster: starting history
crawl [{turns=1}, {stime=(1705935991, 0)}, {etime=1706449598},
{entry_stime=(1705935991, 0)}][2024-01-28 13:46:39.973584] I [master(worker
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}][2024-01-28 13:46:40.98970] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}][2024-01-28 13:46:40.757691] I
[monitor(monitor):228:monitor] Monitor: worker died in startup phase
[{brick=/opt/tier1data2019/brick}][2024-01-28 13:46:40.766860] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Faulty}][2024-01-28 13:46:50.793311] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}][2024-01-28 13:46:50.793469] I
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28
13:46:50.874474] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...[2024-01-28 13:46:52.659114] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7844}][2024-01-28 13:46:52.659461] I
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting
gluster volume locally...[2024-01-28 13:46:53.698769] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0392}][2024-01-28 13:46:53.698984] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor[2024-01-28 13:46:55.831999] I
[master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}][2024-01-28
13:46:55.832354] I [resource(worker /opt/tier1data2019/brick):1292:service_loop]
GLUSTER: Register time [{time=1706449615}][2024-01-28 13:46:55.854684] I
[gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus:
Worker Status Change [{status=Active}][2024-01-28 13:46:55.855251] I
[gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status]
GeorepStatus: Crawl Status Change [{status=History Crawl}][2024-01-28
13:46:55.855419] I [master(worker /opt/tier1data2019/brick):1576:crawl]
_GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)},
{etime=1706449615}, {entry_stime=(1705935991, 0)}][2024-01-28 13:46:56.905496] I
[master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}][2024-01-28 13:46:57.38262] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}][2024-01-28 13:46:57.704128] I
[monitor(monitor):228:monitor] Monitor: worker died in startup phase
[{brick=/opt/tier1data2019/brick}][2024-01-28 13:46:57.706743] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Faulty}][2024-01-28 13:47:07.741438] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}][2024-01-28 13:47:07.741582] I
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28
13:47:07.821284] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...[2024-01-28 13:47:09.573661] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7521}][2024-01-28 13:47:09.573955] I
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting
gluster volume locally...[2024-01-28 13:47:10.612173] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0381}][2024-01-28 13:47:10.612359] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor[2024-01-28 13:47:12.751856] I
[master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}][2024-01-28
13:47:12.752237] I [resource(worker /opt/tier1data2019/brick):1292:service_loop]
GLUSTER: Register time [{time=1706449632}][2024-01-28 13:47:12.759138] I
[gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus:
Worker Status Change [{status=Active}][2024-01-28 13:47:12.759690] I
[gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status]
GeorepStatus: Crawl Status Change [{status=History Crawl}][2024-01-28
13:47:12.759868] I [master(worker /opt/tier1data2019/brick):1576:crawl]
_GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)},
{etime=1706449632}, {entry_stime=(1705935991, 0)}][2024-01-28 13:47:13.810321] I
[master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}][2024-01-28 13:47:13.924068] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}][2024-01-28 13:47:14.617663] I
[monitor(monitor):228:monitor] Monitor: worker died in startup phase
[{brick=/opt/tier1data2019/brick}][2024-01-28 13:47:14.620035] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Faulty}][2024-01-28 13:47:24.646013] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}][2024-01-28 13:47:24.646157] I
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28
13:47:24.725510] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...[2024-01-28 13:47:26.491939] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7662}][2024-01-28 13:47:26.492235] I
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting
gluster volume locally...[2024-01-28 13:47:27.530852] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0385}][2024-01-28 13:47:27.531036] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor[2024-01-28 13:47:29.670099] I
[master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}][2024-01-28
13:47:29.670640] I [resource(worker /opt/tier1data2019/brick):1292:service_loop]
GLUSTER: Register time [{time=1706449649}][2024-01-28 13:47:29.696144] I
[gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus:
Worker Status Change [{status=Active}][2024-01-28 13:47:29.696709] I
[gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status]
GeorepStatus: Crawl Status Change [{status=History Crawl}][2024-01-28
13:47:29.696899] I [master(worker /opt/tier1data2019/brick):1576:crawl]
_GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)},
{etime=1706449649}, {entry_stime=(1705935991, 0)}][2024-01-28 13:47:30.751127] I
[master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}][2024-01-28 13:47:30.885824] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}][2024-01-28 13:47:31.535252] I
[monitor(monitor):228:monitor] Monitor: worker died in startup phase
[{brick=/opt/tier1data2019/brick}][2024-01-28 13:47:31.538450] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Faulty}][2024-01-28 13:47:41.564276] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}][2024-01-28 13:47:41.564426] I
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28
13:47:41.645110] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...[2024-01-28 13:47:43.435830] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7904}][2024-01-28 13:47:43.436285] I
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting
gluster volume locally...[2024-01-28 13:47:44.475671] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0393}][2024-01-28 13:47:44.475865] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor[2024-01-28 13:47:46.630478] I
[master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}][2024-01-28
13:47:46.630924] I [resource(worker /opt/tier1data2019/brick):1292:service_loop]
GLUSTER: Register time [{time=1706449666}][2024-01-28 13:47:46.655069] I
[gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus:
Worker Status Change [{status=Active}][2024-01-28 13:47:46.655752] I
[gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status]
GeorepStatus: Crawl Status Change [{status=History Crawl}][2024-01-28
13:47:46.655926] I [master(worker /opt/tier1data2019/brick):1576:crawl]
_GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)},
{etime=1706449666}, {entry_stime=(1705935991, 0)}][2024-01-28 13:47:47.706875] I
[master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}][2024-01-28 13:47:47.834996] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}][2024-01-28 13:47:48.480822] I
[monitor(monitor):228:monitor] Monitor: worker died in startup phase
[{brick=/opt/tier1data2019/brick}][2024-01-28 13:47:48.491306] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Faulty}][2024-01-28 13:47:58.518263] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}][2024-01-28 13:47:58.518412] I
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28
13:47:58.601096] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...[2024-01-28 13:48:00.355000] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7537}][2024-01-28 13:48:00.355345] I
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting
gluster volume locally...[2024-01-28 13:48:01.395025] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0396}][2024-01-28 13:48:01.395212] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor[2024-01-28 13:48:03.541059] I
[master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}][2024-01-28
13:48:03.541481] I [resource(worker /opt/tier1data2019/brick):1292:service_loop]
GLUSTER: Register time [{time=1706449683}][2024-01-28 13:48:03.567552] I
[gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus:
Worker Status Change [{status=Active}][2024-01-28 13:48:03.568172] I
[gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status]
GeorepStatus: Crawl Status Change [{status=History Crawl}][2024-01-28
13:48:03.568376] I [master(worker /opt/tier1data2019/brick):1576:crawl]
_GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)},
{etime=1706449683}, {entry_stime=(1705935991, 0)}][2024-01-28 13:48:04.621488] I
[master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}][2024-01-28 13:48:04.742268] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}][2024-01-28 13:48:04.919335] I [master(worker
/opt/tier1data2019/brick):2013:syncjob] Syncer: Sync Time Taken [{job=3},
{num_files=10}, {return_code=3}, {duration=0.0180}][2024-01-28 13:48:04.919919]
E [syncdutils(worker /opt/tier1data2019/brick):847:errlog] Popen: command
returned error [{cmd=rsync -aR0 --inplace --files-from=- --super --stats
--numeric-ids --no-implied-dirs --existing --xattrs --acls --ignore-missing-args
. -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-zo_ev6yu/75785990b3233f5dbbab9f43cc3ed895.sock
drtier1data:/proc/799165/cwd}, {error=3}][2024-01-28 13:48:05.399226] I
[monitor(monitor):228:monitor] Monitor: worker died in startup phase
[{brick=/opt/tier1data2019/brick}][2024-01-28 13:48:05.403931] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Faulty}][2024-01-28 13:48:15.430175] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}][2024-01-28 13:48:15.430308] I
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28
13:48:15.510770] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...[2024-01-28 13:48:17.240311] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7294}][2024-01-28 13:48:17.240509] I
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting
gluster volume locally...[2024-01-28 13:48:18.279007] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0384}][2024-01-28 13:48:18.279195] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor[2024-01-28 13:48:20.455937] I
[master(worker /opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}][2024-01-28
13:48:20.456274] I [resource(worker /opt/tier1data2019/brick):1292:service_loop]
GLUSTER: Register time [{time=1706449700}][2024-01-28 13:48:20.464288] I
[gsyncdstatus(worker /opt/tier1data2019/brick):281:set_active] GeorepStatus:
Worker Status Change [{status=Active}][2024-01-28 13:48:20.464807] I
[gsyncdstatus(worker /opt/tier1data2019/brick):253:set_worker_crawl_status]
GeorepStatus: Crawl Status Change [{status=History Crawl}][2024-01-28
13:48:20.464970] I [master(worker /opt/tier1data2019/brick):1576:crawl]
_GMaster: starting history crawl [{turns=1}, {stime=(1705935991, 0)},
{etime=1706449700}, {entry_stime=(1705935991, 0)}][2024-01-28 13:48:21.514201] I
[master(worker /opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}][2024-01-28 13:48:21.644609] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}][2024-01-28 13:48:22.284920] I
[monitor(monitor):228:monitor] Monitor: worker died in startup phase
[{brick=/opt/tier1data2019/brick}][2024-01-28 13:48:22.286189] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Faulty}][2024-01-28 13:48:32.312378] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
[{status=Initializing...}][2024-01-28 13:48:32.312526] I
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28
13:48:32.393484] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...[2024-01-28 13:48:34.91825] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.6981}][2024-01-28 13:48:34.92130] I
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting
gluster volume locally...
Thanks,Anant
From:?Anant Saraswat <anant.saraswat at techblue.co.uk>
Sent:?28 January 2024 1:33 AM
To:?Strahil Nikolov <hunter86_bg at yahoo.com>; gluster-users at
gluster.org <gluster-users at gluster.org>
Subject:?Re: [Gluster-users] Geo-replication status is getting Faulty after few
seconds?Hi @Strahil Nikolov,
I have checked the ssh connection from all the master servers and I can
sshdrtier1data?from master1 and master2 server(old master servers), but I am
unable to ssh drtier1data?from master3 (new node).
[root at master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem root at
drtier1dataTraceback (most recent call last):? File
"/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in
<module>? ? main()? File
"/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 259, in
main? ? if args.subcmd in ("worker"):TypeError: 'in
<string>' requires string as left operand, not NoneTypeConnection to
drtier1data closed.
But?I am able to ssh?drtier1data?from master3?without using thegeorep key.
[root at master3 ~]# ssh ?root at drtier1dataLast login: Sun Jan 28 01:16:25
2024 from 87.246.74.32[root at drtier1data ~]#?
Also, today I restarted the gluster server on master1 as geo-replication is
trying to be active from master1 server, and sometimes I am getting the
following error in gsyncd.log
[2024-01-28 01:27:24.722663] E [syncdutils(worker
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs
--existing --xattrs --acls --ignore-missing-args . -e ssh
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-0exuoeg7/75785990b3233f5dbbab9f43cc3ed895.sock
drtier1data:/proc/553418/cwd}, {error=3}]
Many thanks,AnantFrom:?Strahil Nikolov <hunter86_bg at yahoo.com>
Sent:?27 January 2024 5:25 AM
To:?gluster-users at gluster.org <gluster-users at gluster.org>; Anant
Saraswat <anant.saraswat at techblue.co.uk>
Subject:?Re: [Gluster-users] Geo-replication status is getting Faulty after few
seconds?EXTERNAL: Do not click links or open attachments if you do not recognize
the sender.
Don't forget to test with the georep key. I think it was
/var/lib/glusterd/geo-replication/secret.pem
Best Regards,
Strahil Nikolov
? ??????, 27 ?????? 2024 ?. ? 07:24:07 ?. ???????+2, Strahil Nikolov
<hunter86_bg at yahoo.com> ??????:
Hi Anant,
i would first start checking if you can do ssh from all masters to the slave
node.If you haven't setup a dedicated user for the session, then gluster is
using root.
Best Regards,
Strahil Nikolov
? ?????, 26 ?????? 2024 ?. ? 18:07:59 ?. ???????+2, Anant Saraswat
<anant.saraswat at techblue.co.uk> ??????:
Hi All,
I have run the following commands on master3, and that has added master3 to
geo-replication.
gluster system:: execute gsec_create
gluster volume geo-replication tier1data drtier1data::drtier1data create
push-pem force
gluster volume geo-replication tier1data drtier1data::drtier1data stop
gluster volume geo-replication tier1data drtier1data::drtier1data start
Now I am able to start the geo-replication, but I am getting the same error.
[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-24 19:51:24.158021] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2024-01-24 19:51:25.951998] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7938}]
[2024-01-24 19:51:25.952292] I [resource(worker
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume
locally...
[2024-01-24 19:51:26.986974] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0346}]
[2024-01-24 19:51:26.987137] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2024-01-24 19:51:29.139131] I [master(worker
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-24 19:51:29.139531] I [resource(worker
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time
[{time=1706125889}]
[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=History Crawl}]
[2024-01-24 19:51:29.174558] I [master(worker
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889},
{entry_stime=(1705935991, 0)}]
[2024-01-24 19:51:30.251965] I [master(worker
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}]
[2024-01-24 19:51:30.376715] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}]
[2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Faulty}]
Any idea why it's stuck in this loop?
Thanks,
Anant
________________________________
From: Gluster-users <gluster-users-bounces at gluster.org> on behalf of
Anant Saraswat <anant.saraswat at techblue.co.uk>
Sent: 22 January 2024 9:00 PM
To: gluster-users at gluster.org <gluster-users at gluster.org>
Subject: [Gluster-users] Geo-replication status is getting Faulty after few
seconds
EXTERNAL: Do not click links or open attachments if you do not recognize the
sender.
Hi There,
We have a Gluster setup with three master nodes in replicated mode and one slave
node with geo-replication.
# gluster volume info
Volume Name: tier1data
Type: Replicate
Volume ID: 93c45c14-f700-4d50-962b-7653be471e27
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: master1:/opt/tier1data2019/brick
Brick2: master2:/opt/tier1data2019/brick
Brick3: master3:/opt/tier1data2019/brick
master1 |master2 |?
------------------------------geo-replication----------------------------- |
drtier1datamaster3 |
We added the master3 node a few months back, the initial setup consisted of 2
master nodes and one geo-replicated slave(drtier1data).
Our geo-replication was functioning well with the initial two master nodes
(master1 and master2), where master1 was active and master2 was in passive mode.
However, today, we started experiencing issues where geo-replication suddenly
stopped and became stuck in a loop of Initializing..., Active.. Faulty on
master1, while master2 remained in passive mode.
Upon checking the gsyncd.log on the master1 node, we observed the following
error (please refer to the attached logs for more details):
E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception]
<top>: Gluster Mount process exited [{error=ENOTCONN}]
# gluster volume geo-replication tier1data status
MASTER NODE??????????? MASTER VOL??? MASTER BRICK??????????????? SLAVE USER???
SLAVE??????????????????????????????????????????? SLAVE NODE???
STATUS???????????? CRAWL STATUS??? LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master1??? tier1data???? /opt/tier1data2019/brick??? root?????????
ssh://drtier1data::drtier1data??? N/A?????????? Faulty??? ?????? N/A????????????
N/A
master2??? tier1data???? /opt/tier1data2019/brick??? root?????????
ssh://drtier1data::drtier1data????????????????? Passive???????????
N/A???????????? N/A
Suspecting an issue on the drtier1data(slave)?, I attempted to restart Gluster
on the slave node, also tried to restart drtier1data server without any luck.
After that I tried the following command to get the Primary-log-file for
geo-replication on master1, and got the following error.
# gluster volume geo-replication tier1data drtier1data::drtier1data config
log-file
Staging failed on master3. Error: Geo-replication session between tier1data and
drtier1data::drtier1data does not exist.
geo-replication command failed
Master3 was the new node added a few months back, but geo-replication was
working until today, and we never added this node under geo-replication.
After that, I forcefully stopped the geo-replication, thinking that restarting
geo-replication might fix the issue. However, now the geo-replication is not
starting and is giving the same error.
# gluster volume geo-replication tier1data drtier1data::drtier1data start force
Staging failed on master3. Error: Geo-replication session between tier1data and
drtier1data::drtier1data does not exist.
geo-replication command failed
Can anyone please suggest what I should do next to resolve this issue? As there
is 5TB of data in this volume, I don't want to resync the entire data to
drtier1data. Instead, I want to resume the sync from where it last stopped.
Thanks in advance for any guidance/help.
Kind regards,
Anant
?
DISCLAIMER: This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error, please notify the sender.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute or copy this email. Please notify the sender immediately
by email if you have received this email by mistake and delete this email from
your system.
If you are not the intended recipient, you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of this
information is strictly prohibited. Thanks for your cooperation.
DISCLAIMER: This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error, please notify the sender.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute or copy this email. Please notify the sender immediately
by email if you have received this email by mistake and delete this email from
your system.
If you are not the intended recipient, you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of this
information is strictly prohibited. Thanks for your cooperation.
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge:
https://urldefense.com/v3/__https://meet.google.com/cpu-eiue-hvk__;!!I_DbfM1H!FIFMVBFvoomIXp1pMhjtLbD-1B_qztpAUPBHP5MST7a1hcf3FP8o6GkbQwzQUnS2nT_YIQ1MF7GV_PtM0CAQoOCo4VSgaw$
Gluster-users mailing list
Gluster-users at gluster.org
https://urldefense.com/v3/__https://lists.gluster.org/mailman/listinfo/gluster-users__;!!I_DbfM1H!FIFMVBFvoomIXp1pMhjtLbD-1B_qztpAUPBHP5MST7a1hcf3FP8o6GkbQwzQUnS2nT_YIQ1MF7GV_PtM0CAQoOBrmvmlMg$
DISCLAIMER: This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error, please notify the sender.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute or copy this email. Please notify the sender immediately
by email if you have received this email by mistake and delete this email from
your system.
If you are not the intended recipient, you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of this
information is strictly prohibited. Thanks for your cooperation.
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20240128/6c1ea6cf/attachment.html>
Anant Saraswat
2024-Jan-29  00:20 UTC
[Gluster-users] Geo-replication status is getting Faulty after few seconds
HI Strahil,
As mentioned in my last email, I have copied the gluster public key from master3
to secondary server, and I can now ssh from all master nodes to secondary
server, but still getting the same error.
[root at master1 geo-replication]# ssh root at drtier1data -i
/var/lib/glusterd/geo-replication/secret.pem
Last login: Mon Jan 29 00:14:32 2024 from
[root at drtier1data ~]#
[root at master2 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem root at
drtier1data
Last login: Mon Jan 29 00:02:34 2024 from
[root at drtier1data ~]#
[root at master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem root at
drtier1data
Last login: Mon Jan 29 00:14:41 2024 from
[root at drtier1data ~]#
Thanks,
Anant
________________________________
From: Strahil Nikolov <hunter86_bg at yahoo.com>
Sent: 28 January 2024 10:07 PM
To: Anant Saraswat <anant.saraswat at techblue.co.uk>; gluster-users at
gluster.org <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few
seconds
EXTERNAL: Do not click links or open attachments if you do not recognize the
sender.
Gluster doesn't use the ssh key in /root/.ssh, thus you need to exchange the
public key that corresponds to /var/lib/glusterd/geo-replication/secret.pem . If
you don't know the pub key, google how to obtain it from the private key.
Ensure that all hosts can ssh to the secondary before proceeding with the
troubleshooting.
Best Regards,
Strahil Nikolov
On Sun, Jan 28, 2024 at 15:58, Anant Saraswat
<anant.saraswat at techblue.co.uk> wrote:
Hi All,
I have now copied  /var/lib/glusterd/geo-replication/secret.pem.pub  (public
key) from master3 to drtier1data /root/.ssh/authorized_keys, and now I can ssh
from master node3 to drtier1data using the georep key
(/var/lib/glusterd/geo-replication/secret.pem).
But I am still getting the same error, and geo-replication is getting faulty
again and again.
[2024-01-28 13:46:38.897683] I [resource(worker
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time
[{time=1706449598}]
[2024-01-28 13:46:38.922491] I [gsyncdstatus(worker
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-01-28 13:46:38.923127] I [gsyncdstatus(worker
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=History Crawl}]
[2024-01-28 13:46:38.923313] I [master(worker
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449598},
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:46:39.973584] I [master(worker
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}]
[2024-01-28 13:46:40.98970] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}]
[2024-01-28 13:46:40.757691] I [monitor(monitor):228:monitor] Monitor: worker
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:46:40.766860] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:46:50.793311] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:46:50.793469] I [monitor(monitor):160:monitor] Monitor: starting
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:46:50.874474] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2024-01-28 13:46:52.659114] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7844}]
[2024-01-28 13:46:52.659461] I [resource(worker
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume
locally...
[2024-01-28 13:46:53.698769] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0392}]
[2024-01-28 13:46:53.698984] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2024-01-28 13:46:55.831999] I [master(worker
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:46:55.832354] I [resource(worker
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time
[{time=1706449615}]
[2024-01-28 13:46:55.854684] I [gsyncdstatus(worker
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-01-28 13:46:55.855251] I [gsyncdstatus(worker
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=History Crawl}]
[2024-01-28 13:46:55.855419] I [master(worker
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449615},
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:46:56.905496] I [master(worker
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}]
[2024-01-28 13:46:57.38262] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}]
[2024-01-28 13:46:57.704128] I [monitor(monitor):228:monitor] Monitor: worker
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:46:57.706743] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:47:07.741438] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:47:07.741582] I [monitor(monitor):160:monitor] Monitor: starting
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:47:07.821284] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2024-01-28 13:47:09.573661] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7521}]
[2024-01-28 13:47:09.573955] I [resource(worker
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume
locally...
[2024-01-28 13:47:10.612173] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0381}]
[2024-01-28 13:47:10.612359] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2024-01-28 13:47:12.751856] I [master(worker
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:47:12.752237] I [resource(worker
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time
[{time=1706449632}]
[2024-01-28 13:47:12.759138] I [gsyncdstatus(worker
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-01-28 13:47:12.759690] I [gsyncdstatus(worker
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=History Crawl}]
[2024-01-28 13:47:12.759868] I [master(worker
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449632},
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:47:13.810321] I [master(worker
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}]
[2024-01-28 13:47:13.924068] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}]
[2024-01-28 13:47:14.617663] I [monitor(monitor):228:monitor] Monitor: worker
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:47:14.620035] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:47:24.646013] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:47:24.646157] I [monitor(monitor):160:monitor] Monitor: starting
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:47:24.725510] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2024-01-28 13:47:26.491939] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7662}]
[2024-01-28 13:47:26.492235] I [resource(worker
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume
locally...
[2024-01-28 13:47:27.530852] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0385}]
[2024-01-28 13:47:27.531036] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2024-01-28 13:47:29.670099] I [master(worker
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:47:29.670640] I [resource(worker
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time
[{time=1706449649}]
[2024-01-28 13:47:29.696144] I [gsyncdstatus(worker
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-01-28 13:47:29.696709] I [gsyncdstatus(worker
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=History Crawl}]
[2024-01-28 13:47:29.696899] I [master(worker
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449649},
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:47:30.751127] I [master(worker
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}]
[2024-01-28 13:47:30.885824] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}]
[2024-01-28 13:47:31.535252] I [monitor(monitor):228:monitor] Monitor: worker
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:47:31.538450] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:47:41.564276] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:47:41.564426] I [monitor(monitor):160:monitor] Monitor: starting
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:47:41.645110] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2024-01-28 13:47:43.435830] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7904}]
[2024-01-28 13:47:43.436285] I [resource(worker
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume
locally...
[2024-01-28 13:47:44.475671] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0393}]
[2024-01-28 13:47:44.475865] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2024-01-28 13:47:46.630478] I [master(worker
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:47:46.630924] I [resource(worker
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time
[{time=1706449666}]
[2024-01-28 13:47:46.655069] I [gsyncdstatus(worker
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-01-28 13:47:46.655752] I [gsyncdstatus(worker
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=History Crawl}]
[2024-01-28 13:47:46.655926] I [master(worker
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449666},
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:47:47.706875] I [master(worker
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}]
[2024-01-28 13:47:47.834996] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}]
[2024-01-28 13:47:48.480822] I [monitor(monitor):228:monitor] Monitor: worker
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:47:48.491306] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:47:58.518263] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:47:58.518412] I [monitor(monitor):160:monitor] Monitor: starting
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:47:58.601096] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2024-01-28 13:48:00.355000] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7537}]
[2024-01-28 13:48:00.355345] I [resource(worker
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume
locally...
[2024-01-28 13:48:01.395025] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0396}]
[2024-01-28 13:48:01.395212] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2024-01-28 13:48:03.541059] I [master(worker
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:48:03.541481] I [resource(worker
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time
[{time=1706449683}]
[2024-01-28 13:48:03.567552] I [gsyncdstatus(worker
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-01-28 13:48:03.568172] I [gsyncdstatus(worker
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=History Crawl}]
[2024-01-28 13:48:03.568376] I [master(worker
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449683},
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:48:04.621488] I [master(worker
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}]
[2024-01-28 13:48:04.742268] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}]
[2024-01-28 13:48:04.919335] I [master(worker
/opt/tier1data2019/brick):2013:syncjob] Syncer: Sync Time Taken [{job=3},
{num_files=10}, {return_code=3}, {duration=0.0180}]
[2024-01-28 13:48:04.919919] E [syncdutils(worker
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs
--existing --xattrs --acls --ignore-missing-args . -e ssh
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-zo_ev6yu/75785990b3233f5dbbab9f43cc3ed895.sock
drtier1data:/proc/799165/cwd}, {error=3}]
[2024-01-28 13:48:05.399226] I [monitor(monitor):228:monitor] Monitor: worker
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:48:05.403931] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:48:15.430175] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:48:15.430308] I [monitor(monitor):160:monitor] Monitor: starting
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:48:15.510770] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2024-01-28 13:48:17.240311] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7294}]
[2024-01-28 13:48:17.240509] I [resource(worker
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume
locally...
[2024-01-28 13:48:18.279007] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0384}]
[2024-01-28 13:48:18.279195] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2024-01-28 13:48:20.455937] I [master(worker
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:48:20.456274] I [resource(worker
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time
[{time=1706449700}]
[2024-01-28 13:48:20.464288] I [gsyncdstatus(worker
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-01-28 13:48:20.464807] I [gsyncdstatus(worker
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=History Crawl}]
[2024-01-28 13:48:20.464970] I [master(worker
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449700},
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:48:21.514201] I [master(worker
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}]
[2024-01-28 13:48:21.644609] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}]
[2024-01-28 13:48:22.284920] I [monitor(monitor):228:monitor] Monitor: worker
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:48:22.286189] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:48:32.312378] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:48:32.312526] I [monitor(monitor):160:monitor] Monitor: starting
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:48:32.393484] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2024-01-28 13:48:34.91825] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.6981}]
[2024-01-28 13:48:34.92130] I [resource(worker
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume
locally...
Thanks,
Anant
________________________________
From: Anant Saraswat <anant.saraswat at techblue.co.uk>
Sent: 28 January 2024 1:33 AM
To: Strahil Nikolov <hunter86_bg at yahoo.com>; gluster-users at
gluster.org <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few
seconds
Hi @Strahil Nikolov<mailto:hunter86_bg at yahoo.com>,
I have checked the ssh connection from all the master servers and I can ssh
drtier1data from master1 and master2 server(old master servers), but I am unable
to ssh drtier1data from master3 (new node).
[root at master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem root at
drtier1data
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325,
in <module>
    main()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 259,
in main
    if args.subcmd in ("worker"):
TypeError: 'in <string>' requires string as left operand, not
NoneType
Connection to drtier1data closed.
But I am able to ssh  drtier1data from master3 without using the georep key.
[root at master3 ~]# ssh  root at drtier1data
Last login: Sun Jan 28 01:16:25 2024 from 87.246.74.32
[root at drtier1data ~]#
Also, today I restarted the gluster server on master1 as geo-replication is
trying to be active from master1 server, and sometimes I am getting the
following error in gsyncd.log
[2024-01-28 01:27:24.722663] E [syncdutils(worker
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs
--existing --xattrs --acls --ignore-missing-args . -e ssh
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-0exuoeg7/75785990b3233f5dbbab9f43cc3ed895.sock
drtier1data:/proc/553418/cwd}, {error=3}]
Many thanks,
Anant
________________________________
From: Strahil Nikolov <hunter86_bg at yahoo.com>
Sent: 27 January 2024 5:25 AM
To: gluster-users at gluster.org <gluster-users at gluster.org>; Anant
Saraswat <anant.saraswat at techblue.co.uk>
Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few
seconds
EXTERNAL: Do not click links or open attachments if you do not recognize the
sender.
Don't forget to test with the georep key. I think it was
/var/lib/glusterd/geo-replication/secret.pem
Best Regards,
Strahil Nikolov
? ??????, 27 ?????? 2024 ?. ? 07:24:07 ?. ???????+2, Strahil Nikolov
<hunter86_bg at yahoo.com> ??????:
Hi Anant,
i would first start checking if you can do ssh from all masters to the slave
node.If you haven't setup a dedicated user for the session, then gluster is
using root.
Best Regards,
Strahil Nikolov
? ?????, 26 ?????? 2024 ?. ? 18:07:59 ?. ???????+2, Anant Saraswat
<anant.saraswat at techblue.co.uk> ??????:
Hi All,
I have run the following commands on master3, and that has added master3 to
geo-replication.
gluster system:: execute gsec_create
gluster volume geo-replication tier1data drtier1data::drtier1data create
push-pem force
gluster volume geo-replication tier1data drtier1data::drtier1data stop
gluster volume geo-replication tier1data drtier1data::drtier1data start
Now I am able to start the geo-replication, but I am getting the same error.
[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-24 19:51:24.158021] I [resource(worker
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection
between master and slave...
[2024-01-24 19:51:25.951998] I [resource(worker
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between
master and slave established. [{duration=1.7938}]
[2024-01-24 19:51:25.952292] I [resource(worker
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume
locally...
[2024-01-24 19:51:26.986974] I [resource(worker
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume
[{duration=1.0346}]
[2024-01-24 19:51:26.987137] I [subcmds(worker
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2024-01-24 19:51:29.139131] I [master(worker
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-24 19:51:29.139531] I [resource(worker
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time
[{time=1706125889}]
[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl
Status Change [{status=History Crawl}]
[2024-01-24 19:51:29.174558] I [master(worker
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889},
{entry_stime=(1705935991, 0)}]
[2024-01-24 19:51:30.251965] I [master(worker
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time
[{stime=(1705935991, 0)}]
[2024-01-24 19:51:30.376715] E [syncdutils(worker
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
process exited [{error=ENOTCONN}]
[2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change [{status=Faulty}]
Any idea why it's stuck in this loop?
Thanks,
Anant
________________________________
From: Gluster-users <gluster-users-bounces at gluster.org> on behalf of
Anant Saraswat <anant.saraswat at techblue.co.uk>
Sent: 22 January 2024 9:00 PM
To: gluster-users at gluster.org <gluster-users at gluster.org>
Subject: [Gluster-users] Geo-replication status is getting Faulty after few
seconds
EXTERNAL: Do not click links or open attachments if you do not recognize the
sender.
Hi There,
We have a Gluster setup with three master nodes in replicated mode and one slave
node with geo-replication.
# gluster volume info
Volume Name: tier1data
Type: Replicate
Volume ID: 93c45c14-f700-4d50-962b-7653be471e27
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: master1:/opt/tier1data2019/brick
Brick2: master2:/opt/tier1data2019/brick
Brick3: master3:/opt/tier1data2019/brick
master1 |master2 | 
------------------------------geo-replication----------------------------- |
drtier1datamaster3 |
We added the master3 node a few months back, the initial setup consisted of 2
master nodes and one geo-replicated slave(drtier1data).
Our geo-replication was functioning well with the initial two master nodes
(master1 and master2), where master1 was active and master2 was in passive mode.
However, today, we started experiencing issues where geo-replication suddenly
stopped and became stuck in a loop of Initializing..., Active.. Faulty on
master1, while master2 remained in passive mode.
Upon checking the gsyncd.log on the master1 node, we observed the following
error (please refer to the attached logs for more details):
E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception]
<top>: Gluster Mount process exited [{error=ENOTCONN}]
# gluster volume geo-replication tier1data status
MASTER NODE            MASTER VOL    MASTER BRICK                SLAVE USER   
SLAVE                                            SLAVE NODE    STATUS           
CRAWL STATUS    LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master1    tier1data     /opt/tier1data2019/brick    root         
ssh://drtier1data::drtier1data    N/A           Faulty           N/A            
N/A
master2    tier1data     /opt/tier1data2019/brick    root         
ssh://drtier1data::drtier1data                  Passive            N/A          
N/A
Suspecting an issue on the drtier1data(slave)?, I attempted to restart Gluster
on the slave node, also tried to restart drtier1data server without any luck.
After that I tried the following command to get the Primary-log-file for
geo-replication on master1, and got the following error.
# gluster volume geo-replication tier1data drtier1data::drtier1data config
log-file
Staging failed on master3. Error: Geo-replication session between tier1data and
drtier1data::drtier1data does not exist.
geo-replication command failed
Master3 was the new node added a few months back, but geo-replication was
working until today, and we never added this node under geo-replication.
After that, I forcefully stopped the geo-replication, thinking that restarting
geo-replication might fix the issue. However, now the geo-replication is not
starting and is giving the same error.
# gluster volume geo-replication tier1data drtier1data::drtier1data start force
Staging failed on master3. Error: Geo-replication session between tier1data and
drtier1data::drtier1data does not exist.
geo-replication command failed
Can anyone please suggest what I should do next to resolve this issue? As there
is 5TB of data in this volume, I don't want to resync the entire data to
drtier1data. Instead, I want to resume the sync from where it last stopped.
Thanks in advance for any guidance/help.
Kind regards,
Anant
?
DISCLAIMER: This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error, please notify the sender.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute or copy this email. Please notify the sender immediately
by email if you have received this email by mistake and delete this email from
your system.
If you are not the intended recipient, you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of this
information is strictly prohibited. Thanks for your cooperation.
DISCLAIMER: This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error, please notify the sender.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute or copy this email. Please notify the sender immediately
by email if you have received this email by mistake and delete this email from
your system.
If you are not the intended recipient, you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of this
information is strictly prohibited. Thanks for your cooperation.
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge:
https://urldefense.com/v3/__https://meet.google.com/cpu-eiue-hvk__;!!I_DbfM1H!FIFMVBFvoomIXp1pMhjtLbD-1B_qztpAUPBHP5MST7a1hcf3FP8o6GkbQwzQUnS2nT_YIQ1MF7GV_PtM0CAQoOCo4VSgaw$
Gluster-users mailing list
Gluster-users at gluster.org
https://urldefense.com/v3/__https://lists.gluster.org/mailman/listinfo/gluster-users__;!!I_DbfM1H!FIFMVBFvoomIXp1pMhjtLbD-1B_qztpAUPBHP5MST7a1hcf3FP8o6GkbQwzQUnS2nT_YIQ1MF7GV_PtM0CAQoOBrmvmlMg$
DISCLAIMER: This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error, please notify the sender.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute or copy this email. Please notify the sender immediately
by email if you have received this email by mistake and delete this email from
your system.
If you are not the intended recipient, you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of this
information is strictly prohibited. Thanks for your cooperation.
DISCLAIMER: This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error, please notify the sender.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute or copy this email. Please notify the sender immediately
by email if you have received this email by mistake and delete this email from
your system.
If you are not the intended recipient, you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of this
information is strictly prohibited. Thanks for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20240129/20b8d71a/attachment.html>