Marcus Pedersén
2018-Jul-23 10:34 UTC
[Gluster-users] Upgrade to 4.1.1 geo-replication does not work
Hi Sunny, ldconfig -p /usr/local/lib | grep libgf Output: libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0 libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0 So that seems to be alright, right? Best regards Marcus ################ Marcus Peders?n Systemadministrator Interbull Centre ################ Sent from my phone ################ Den 23 juli 2018 11:17 skrev Sunny Kumar <sunkumar at redhat.com>: Hi Marcus, On Wed, Jul 18, 2018 at 4:08 PM Marcus Peders?n <marcus.pedersen at slu.se> wrote:> > Hi Kotresh, > > I ran: > > #ldconfig /usr/libcan you do - ldconfig /usr/local/lib Output:> > on all nodes in both clusters but I still get the same error. > > What to do? > > > Output for: > > # ldconfig -p /usr/lib | grep libgf > > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 > libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0 > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 > libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 > libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0 > > > I read somewere that you could change some settings for geo-replication to speed up sync. > > I can not remember where I saw that and what config parameters. > > When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes, > > and that will take a while before the slave nodes have catched up. > > > Thanks and regards > > Marcus Peders?n > > > Part of gsyncd.log: > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__ > raise res > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory > [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. > [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373 > [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853 > [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker > res = getattr(self.obj, rmeth)(*in_data[2:]) > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init > return Changes.cl_init() > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__ > from libgfchangelog import Changes as LChanges > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module> > class Changes(object): > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes > use_errno=True) > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__ > self._handle = _dlopen(self._name, mode) > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory > [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError > [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main > func(args) > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker > local.service_loop(remote) > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop > changelog_agent.init() > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__ > return self.ins(self.meth, *a) > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__ > raise res > OSError: libgfchangelog.so: cannot open shared object file: No such file or directoryI think then you will not see this.> [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. > [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > > > ________________________________ > Fr?n: Kotresh Hiremath Ravishankar <khiremat at redhat.com> > Skickat: den 18 juli 2018 06:05 > Till: Marcus Peders?n > Kopia: gluster-users at gluster.org > ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work > > Hi Marcus, > > Well there is nothing wrong in setting up a symlink for gluster binary location, but > there is a geo-rep command to set it so that gsyncd will search there. > > To set on master > #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location> > > To set on slave > #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location> > > Thanks, > Kotresh HR > > > On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <khiremat at redhat.com> wrote: >> >> Hi Marcus, >> >> I am testing out 4.1 myself and I will have some update today. >> For this particular traceback, gsyncd is not able to find the library. >> Is it the rpm install? If so, gluster libraries would be in /usr/lib. >> Please run the cmd below. >> >> #ldconfig /usr/lib >> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so) >> >> Geo-rep should be fixed automatically. >> >> Thanks, >> Kotresh HR >> >> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: >>> >>> Hi again, >>> >>> I continue to do some testing, but now I have come to a stage where I need help. >>> >>> >>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link. >>> >>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well. >>> >>> Both links were done on all slave nodes. >>> >>> >>> Now I have a new error that I can not resolve myself. >>> >>> It can not open libgfchangelog.so >>> >>> >>> Many thanks! >>> >>> Regards >>> >>> Marcus Peders?n >>> >>> >>> Part of gsyncd.log: >>> >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory >>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. >>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster >>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 >>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf >>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf >>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... >>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... >>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151 >>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... >>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901 >>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor >>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed: >>> Traceback (most recent call last): >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker >>> res = getattr(self.obj, rmeth)(*in_data[2:]) >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init >>> return Changes.cl_init() >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__ >>> from libgfchangelog import Changes as LChanges >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module> >>> class Changes(object): >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes >>> use_errno=True) >>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__ >>> self._handle = _dlopen(self._name, mode) >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory >>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError >>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL: >>> Traceback (most recent call last): >>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main >>> func(args) >>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker >>> local.service_loop(remote) >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop >>> changelog_agent.init() >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__ >>> return self.ins(self.meth, *a) >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__ >>> raise res >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory >>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. >>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster >>> >>> >>> >>> ________________________________ >>> Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> >>> Skickat: den 16 juli 2018 21:59 >>> Till: khiremat at redhat.com >>> >>> Kopia: gluster-users at gluster.org >>> ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work >>> >>> >>> Hi Kotresh, >>> >>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file: >>> >>> /var/log/glusterfs/cli.log >>> >>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it. >>> >>> Starting geo-replication after that gives response successful but all nodes get status Faulty. >>> >>> >>> If I run: gluster-mountbroker status >>> >>> I get: >>> >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+ >>> | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS | >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+ >>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) | >>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) | >>> | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) | >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+ >>> >>> >>> and that is all nodes on slave cluster, so mountbroker seems ok. >>> >>> >>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing. >>> >>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster >>> >>> Another error is that SSH between master and slave is broken, >>> >>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run: >>> >>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 geouser at urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume >>> >>> as geouser and that works, which means that the ssh connection works. >>> >>> >>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup? >>> >>> Is gluster supposed to be in /usr/local/sbin/gluster? >>> >>> >>> Do I have any options or should I remove current geo-replication and create a new? >>> >>> How much do I need to clean up before creating a new geo-replication? >>> >>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files? >>> >>> >>> Many thanks in advance! >>> >>> Marcus Peders?n >>> >>> >>> Part from the gsyncd.log: >>> >>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\ >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume geouser at urd-gds-geo-001::urd-gds-volu\ >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\ >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1 >>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory) >>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. >>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster >>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 >>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf >>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf >>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... >>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... >>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken >>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\ >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume geouser at urd-gds-geo-001::urd-gds-volu\ >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\ >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1 >>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory) >>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. >>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster >>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 >>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf >>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf >>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... >>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... >>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken >>> >>> ________________________________ >>> Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> >>> Skickat: den 13 juli 2018 14:50 >>> Till: Kotresh Hiremath Ravishankar >>> Kopia: gluster-users at gluster.org >>> ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work >>> >>> Hi Kotresh, >>> Yes, all nodes have the same version 4.1.1 both master and slave. >>> All glusterd are crashing on the master side. >>> Will send logs tonight. >>> >>> Thanks, >>> Marcus >>> >>> ################ >>> Marcus Peders?n >>> Systemadministrator >>> Interbull Centre >>> ################ >>> Sent from my phone >>> ################ >>> >>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: >>> >>> Hi Marcus, >>> >>> Is the gluster geo-rep version is same on both master and slave? >>> >>> Thanks, >>> Kotresh HR >>> >>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: >>> >>> Hi Kotresh, >>> >>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave. >>> >>> I rebooted all servers but geo-replication status is still Stopped. >>> >>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes. >>> >>> Nothing has been written to geo-replication logs since I sent the tail of the log. >>> >>> So I do not know what info to provide? >>> >>> >>> Please, help me to find a way to solve this. >>> >>> >>> Thanks! >>> >>> >>> Regards >>> >>> Marcus >>> >>> >>> ________________________________ >>> Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> >>> Skickat: den 12 juli 2018 08:51 >>> Till: Kotresh Hiremath Ravishankar >>> Kopia: gluster-users at gluster.org >>> ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work >>> >>> Thanks Kotresh, >>> I installed through the official centos channel, centos-release-gluster41. >>> Isn't this fix included in centos install? >>> I will have a look, test it tonight and come back to you! >>> >>> Thanks a lot! >>> >>> Regards >>> Marcus >>> >>> ################ >>> Marcus Peders?n >>> Systemadministrator >>> Interbull Centre >>> ################ >>> Sent from my phone >>> ################ >>> >>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: >>> >>> Hi Marcus, >>> >>> I think the fix [1] is needed in 4.1 >>> Could you please this out and let us know if that works for you? >>> >>> [1] https://review.gluster.org/#/c/20207/ >>> >>> Thanks, >>> Kotresh HR >>> >>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: >>> >>> Hi all, >>> >>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade. >>> >>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1). >>> >>> Both clusters works the way they should on their own. >>> >>> After upgrade on master side status for all geo-replication nodes is Stopped. >>> >>> I tried to start the geo-replication from master node and response back was started successfully. >>> >>> Status again .... Stopped >>> >>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes. >>> >>> After a restart of all glusterd the master cluster was up again. >>> >>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped. >>> >>> >>> Please help me get the geo-replication up and running again. >>> >>> >>> Best regards >>> >>> Marcus Peders?n >>> >>> >>> Part of geo-replication log from master node: >>> >>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining... >>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave... >>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken >>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\ >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\ >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2 >>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h] >>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> >>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\ >>> elete} >>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ... >>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\ >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete') >>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. >>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF. >>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. >>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster >>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost:urd-gds-volume >>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave... >>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining... >>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken >>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\ >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\ >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2 >>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h] >>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> >>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\ >>> elete} >>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ... >>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\ >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete') >>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. >>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF. >>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. >>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster >>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost:urd-gds-volume >>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster >>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster >>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent >>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL: >>> Traceback (most recent call last): >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap >>> except: >>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon >>> sys.exit() >>> TypeError: 'int' object is not iterable >>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting. >>> >>> --- >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >>> >>> --- >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here >>> >>> --- >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here >>> >>> >>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >>> >>> --- >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here >>> >>> --- >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here >>> >>> --- >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here >> >> >> >> >> -- >> Thanks and Regards, >> Kotresh H R > > > > > -- > Thanks and Regards, > Kotresh H R > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users--- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180723/795cc8ba/attachment.html>
Sunny Kumar
2018-Jul-23 10:53 UTC
[Gluster-users] Upgrade to 4.1.1 geo-replication does not work
Hi Marcus, On Mon, Jul 23, 2018 at 4:04 PM Marcus Peders?n <marcus.pedersen at slu.se> wrote:> > Hi Sunny, > ldconfig -p /usr/local/lib | grep libgf > Output: > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0 > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0 > > So that seems to be alright, right? >Yes, this seems wright can you share the gsyncd.log again> Best regards > Marcus > > ################ > Marcus Peders?n > Systemadministrator > Interbull Centre > ################ > Sent from my phone > ################ > > Den 23 juli 2018 11:17 skrev Sunny Kumar <sunkumar at redhat.com>: > > Hi Marcus, > > On Wed, Jul 18, 2018 at 4:08 PM Marcus Peders?n <marcus.pedersen at slu.se> wrote: > > > > Hi Kotresh, > > > > I ran: > > > > #ldconfig /usr/lib > can you do - > ldconfig /usr/local/lib > > > Output: > > > > > on all nodes in both clusters but I still get the same error. > > > > What to do? > > > > > > Output for: > > > > # ldconfig -p /usr/lib | grep libgf > > > > libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0 > > libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0 > > libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0 > > libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0 > > libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0 > > > > > > I read somewere that you could change some settings for geo-replication to speed up sync. > > > > I can not remember where I saw that and what config parameters. > > > > When geo-replication works I have 30TB on master cluster that has to be synced to slave nodes, > > > > and that will take a while before the slave nodes have catched up. > > > > > > Thanks and regards > > > > Marcus Peders?n > > > > > > Part of gsyncd.log: > > > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__ > > raise res > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory > > [2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. > > [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > > [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > > [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > > [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > > [2018-07-18 10:24:03.335380] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > > [2018-07-18 10:24:03.343605] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > > [2018-07-18 10:24:04.881148] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5373 > > [2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > > [2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0853 > > [2018-07-18 10:24:05.968028] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > > [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker > > res = getattr(self.obj, rmeth)(*in_data[2:]) > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init > > return Changes.cl_init() > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__ > > from libgfchangelog import Changes as LChanges > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module> > > class Changes(object): > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes > > use_errno=True) > > File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__ > > self._handle = _dlopen(self._name, mode) > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory > > [2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=1146:139672481965888:1531909445.98 method=init error=OSError > > [2018-07-18 10:24:05.984747] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main > > func(args) > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker > > local.service_loop(remote) > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop > > changelog_agent.init() > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__ > > return self.ins(self.meth, *a) > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__ > > raise res > > OSError: libgfchangelog.so: cannot open shared object file: No such file or directory > I think then you will not see this. > > [2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. > > [2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > > > > > > ________________________________ > > Fr?n: Kotresh Hiremath Ravishankar <khiremat at redhat.com> > > Skickat: den 18 juli 2018 06:05 > > Till: Marcus Peders?n > > Kopia: gluster-users at gluster.org > > ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work > > > > Hi Marcus, > > > > Well there is nothing wrong in setting up a symlink for gluster binary location, but > > there is a geo-rep command to set it so that gsyncd will search there. > > > > To set on master > > #gluster vol geo-rep <mastervol> <slave-vol> config gluster-command-dir <gluster-binary-location> > > > > To set on slave > > #gluster vol geo-rep <mastervol> <slave-vol> config slave-gluster-command-dir <gluster-binary-location> > > > > Thanks, > > Kotresh HR > > > > > > On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <khiremat at redhat.com> wrote: > >> > >> Hi Marcus, > >> > >> I am testing out 4.1 myself and I will have some update today. > >> For this particular traceback, gsyncd is not able to find the library. > >> Is it the rpm install? If so, gluster libraries would be in /usr/lib. > >> Please run the cmd below. > >> > >> #ldconfig /usr/lib > >> #ldconfig -p /usr/lib | grep libgf (This should list libgfchangelog.so) > >> > >> Geo-rep should be fixed automatically. > >> > >> Thanks, > >> Kotresh HR > >> > >> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: > >>> > >>> Hi again, > >>> > >>> I continue to do some testing, but now I have come to a stage where I need help. > >>> > >>> > >>> gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I made a link. > >>> > >>> After that /usr/local/sbin/glusterfs was missing so I made a link there as well. > >>> > >>> Both links were done on all slave nodes. > >>> > >>> > >>> Now I have a new error that I can not resolve myself. > >>> > >>> It can not open libgfchangelog.so > >>> > >>> > >>> Many thanks! > >>> > >>> Regards > >>> > >>> Marcus Peders?n > >>> > >>> > >>> Part of gsyncd.log: > >>> > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory > >>> [2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. > >>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > >>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > >>> [2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > >>> [2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > >>> [2018-07-17 19:32:17.542363] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > >>> [2018-07-17 19:32:17.550894] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > >>> [2018-07-17 19:32:19.166246] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6151 > >>> [2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > >>> [2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0901 > >>> [2018-07-17 19:32:20.257921] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > >>> [2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] <top>: call failed: > >>> Traceback (most recent call last): > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker > >>> res = getattr(self.obj, rmeth)(*in_data[2:]) > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, in init > >>> return Changes.cl_init() > >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, in __getattr__ > >>> from libgfchangelog import Changes as LChanges > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, in <module> > >>> class Changes(object): > >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, in Changes > >>> use_errno=True) > >>> File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__ > >>> self._handle = _dlopen(self._name, mode) > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory > >>> [2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] RepceClient: call failed call=6078:139982918485824:1531855940.27 method=init error=OSError > >>> [2018-07-17 19:32:20.275192] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL: > >>> Traceback (most recent call last): > >>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main > >>> func(args) > >>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker > >>> local.service_loop(remote) > >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in service_loop > >>> changelog_agent.init() > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in __call__ > >>> return self.ins(self.meth, *a) > >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in __call__ > >>> raise res > >>> OSError: libgfchangelog.so: cannot open shared object file: No such file or directory > >>> [2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. > >>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > >>> > >>> > >>> > >>> ________________________________ > >>> Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> > >>> Skickat: den 16 juli 2018 21:59 > >>> Till: khiremat at redhat.com > >>> > >>> Kopia: gluster-users at gluster.org > >>> ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work > >>> > >>> > >>> Hi Kotresh, > >>> > >>> I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file: > >>> > >>> /var/log/glusterfs/cli.log > >>> > >>> I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it. > >>> > >>> Starting geo-replication after that gives response successful but all nodes get status Faulty. > >>> > >>> > >>> If I run: gluster-mountbroker status > >>> > >>> I get: > >>> > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+ > >>> | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS | > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+ > >>> | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) | > >>> | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) | > >>> | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) | > >>> +-----------------------------+-------------+---------------------------+--------------+--------------------------+ > >>> > >>> > >>> and that is all nodes on slave cluster, so mountbroker seems ok. > >>> > >>> > >>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing. > >>> > >>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster > >>> > >>> Another error is that SSH between master and slave is broken, > >>> > >>> but now when I have changed permission on /var/log/glusterfs/cli.log I can run: > >>> > >>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 geouser at urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume > >>> > >>> as geouser and that works, which means that the ssh connection works. > >>> > >>> > >>> Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup? > >>> > >>> Is gluster supposed to be in /usr/local/sbin/gluster? > >>> > >>> > >>> Do I have any options or should I remove current geo-replication and create a new? > >>> > >>> How much do I need to clean up before creating a new geo-replication? > >>> > >>> In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files? > >>> > >>> > >>> Many thanks in advance! > >>> > >>> Marcus Peders?n > >>> > >>> > >>> Part from the gsyncd.log: > >>> > >>> [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\ > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume geouser at urd-gds-geo-001::urd-gds-volu\ > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\ > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1 > >>> [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory) > >>> [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. > >>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster > >>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > >>> [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > >>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > >>> [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > >>> [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > >>> [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken > >>> [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\ > >>> n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume geouser at urd-gds-geo-001::urd-gds-volu\ > >>> me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\ > >>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1 > >>> [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory) > >>> [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. > >>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster > >>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > >>> [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > >>> [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > >>> [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > >>> [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > >>> [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken > >>> > >>> ________________________________ > >>> Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> > >>> Skickat: den 13 juli 2018 14:50 > >>> Till: Kotresh Hiremath Ravishankar > >>> Kopia: gluster-users at gluster.org > >>> ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work > >>> > >>> Hi Kotresh, > >>> Yes, all nodes have the same version 4.1.1 both master and slave. > >>> All glusterd are crashing on the master side. > >>> Will send logs tonight. > >>> > >>> Thanks, > >>> Marcus > >>> > >>> ################ > >>> Marcus Peders?n > >>> Systemadministrator > >>> Interbull Centre > >>> ################ > >>> Sent from my phone > >>> ################ > >>> > >>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: > >>> > >>> Hi Marcus, > >>> > >>> Is the gluster geo-rep version is same on both master and slave? > >>> > >>> Thanks, > >>> Kotresh HR > >>> > >>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: > >>> > >>> Hi Kotresh, > >>> > >>> i have replaced both files (gsyncdconfig.py and repce.py) in all nodes both master and slave. > >>> > >>> I rebooted all servers but geo-replication status is still Stopped. > >>> > >>> I tried to start geo-replication with response Successful but status still show Stopped on all nodes. > >>> > >>> Nothing has been written to geo-replication logs since I sent the tail of the log. > >>> > >>> So I do not know what info to provide? > >>> > >>> > >>> Please, help me to find a way to solve this. > >>> > >>> > >>> Thanks! > >>> > >>> > >>> Regards > >>> > >>> Marcus > >>> > >>> > >>> ________________________________ > >>> Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> > >>> Skickat: den 12 juli 2018 08:51 > >>> Till: Kotresh Hiremath Ravishankar > >>> Kopia: gluster-users at gluster.org > >>> ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work > >>> > >>> Thanks Kotresh, > >>> I installed through the official centos channel, centos-release-gluster41. > >>> Isn't this fix included in centos install? > >>> I will have a look, test it tonight and come back to you! > >>> > >>> Thanks a lot! > >>> > >>> Regards > >>> Marcus > >>> > >>> ################ > >>> Marcus Peders?n > >>> Systemadministrator > >>> Interbull Centre > >>> ################ > >>> Sent from my phone > >>> ################ > >>> > >>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: > >>> > >>> Hi Marcus, > >>> > >>> I think the fix [1] is needed in 4.1 > >>> Could you please this out and let us know if that works for you? > >>> > >>> [1] https://review.gluster.org/#/c/20207/ > >>> > >>> Thanks, > >>> Kotresh HR > >>> > >>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: > >>> > >>> Hi all, > >>> > >>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade. > >>> > >>> I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1). > >>> > >>> Both clusters works the way they should on their own. > >>> > >>> After upgrade on master side status for all geo-replication nodes is Stopped. > >>> > >>> I tried to start the geo-replication from master node and response back was started successfully. > >>> > >>> Status again .... Stopped > >>> > >>> Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes. > >>> > >>> After a restart of all glusterd the master cluster was up again. > >>> > >>> Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped. > >>> > >>> > >>> Please help me get the geo-replication up and running again. > >>> > >>> > >>> Best regards > >>> > >>> Marcus Peders?n > >>> > >>> > >>> Part of geo-replication log from master node: > >>> > >>> [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining... > >>> [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave... > >>> [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken > >>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\ > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\ > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2 > >>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h] > >>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> > >>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\ > >>> elete} > >>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ... > >>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\ > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete') > >>> [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. > >>> [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF. > >>> [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. > >>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster > >>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost:urd-gds-volume > >>> [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave... > >>> [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining... > >>> [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken > >>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\ > >>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\ > >>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2 > >>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h] > >>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> > >>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\ > >>> elete} > >>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ... > >>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\ > >>> or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete') > >>> [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. > >>> [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF. > >>> [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. > >>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster > >>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost:urd-gds-volume > >>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster > >>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster > >>> [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent > >>> [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL: > >>> Traceback (most recent call last): > >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap > >>> except: > >>> File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon > >>> sys.exit() > >>> TypeError: 'int' object is not iterable > >>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting. > >>> > >>> --- > >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > >>> > >>> > >>> _______________________________________________ > >>> Gluster-users mailing list > >>> Gluster-users at gluster.org > >>> https://lists.gluster.org/mailman/listinfo/gluster-users > >>> > >>> > >>> > >>> > >>> -- > >>> Thanks and Regards, > >>> Kotresh H R > >>> > >>> > >>> --- > >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > >>> > >>> --- > >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > >>> > >>> > >>> > >>> > >>> -- > >>> Thanks and Regards, > >>> Kotresh H R > >>> > >>> > >>> --- > >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > >>> > >>> --- > >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > >>> > >>> --- > >>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > >>> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > >> > >> > >> > >> > >> -- > >> Thanks and Regards, > >> Kotresh H R > > > > > > > > > > -- > > Thanks and Regards, > > Kotresh H R > > > > --- > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here- Sunny