thr3ads.net - Gluster users - [Gluster-users] Upgrade to 4.1.1 geo-replication does not work [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Kotresh Hiremath Ravishankar

2018-Jul-18 04:05 UTC

[Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Marcus,

Well there is nothing wrong in setting up a symlink for gluster binary
location, but
there is a geo-rep command to set it so that gsyncd will search there.

To set on master
#gluster vol geo-rep <mastervol> <slave-vol> config
gluster-command-dir
<gluster-binary-location>

To set on slave
#gluster vol geo-rep <mastervol> <slave-vol> config
slave-gluster-command-dir <gluster-binary-location>

Thanks,
Kotresh HR


On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <
khiremat at redhat.com> wrote:
> Hi Marcus,
>
> I am testing out 4.1 myself and I will have some update today.
> For this particular traceback, gsyncd is not able to find the library.
> Is it the rpm install? If so, gluster libraries would be in /usr/lib.
> Please run the cmd below.
>
> #ldconfig /usr/lib
> #ldconfig -p /usr/lib | grep libgf  (This should list libgfchangelog.so)
>
> Geo-rep should be fixed automatically.
>
> Thanks,
> Kotresh HR
>
> On Wed, Jul 18, 2018 at 1:27 AM, Marcus Peders?n <marcus.pedersen at
slu.se>
> wrote:
>
>> Hi again,
>>
>> I continue to do some testing, but now I have come to a stage where I
>> need help.
>>
>>
>> gsyncd.log was complaining about that /usr/local/sbin/gluster was
missing
>> so I made a link.
>>
>> After that /usr/local/sbin/glusterfs was missing so I made a link there
>> as well.
>>
>> Both links were done on all slave nodes.
>>
>>
>> Now I have a new error that I can not resolve myself.
>>
>> It can not open libgfchangelog.so
>>
>>
>> Many thanks!
>>
>> Regards
>>
>> Marcus Peders?n
>>
>>
>> Part of gsyncd.log:
>>
>> OSError: libgfchangelog.so: cannot open shared object file: No such
file
>> or directory
>> [2018-07-17 19:32:06.517106] I [repce(agent
/urd-gds/gluster):89:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor:
>> worker died in startup phase     brick=/urd-gds/gluster
>> [2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker   brick=/urd-gds/gluster 
slave_node=urd-gds-geo-000
>> [2018-07-17 19:32:17.541547] I [gsyncd(agent
/urd-gds/gluster):297:main]
>> <top>: Using session config file      
path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-17 19:32:17.541959] I [gsyncd(worker
/urd-gds/gluster):297:main]
>> <top>: Using session config file     
path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-17 19:32:17.542363] I [changelogagent(agent
>> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>> [2018-07-17 19:32:17.550894] I [resource(worker
>> /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>> [2018-07-17 19:32:19.166246] I [resource(worker
>> /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between
>> master and slave established.        duration=1.6151
>> [2018-07-17 19:32:19.166806] I [resource(worker
>> /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume
>> locally...
>> [2018-07-17 19:32:20.257344] I [resource(worker
>> /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume
>> duration=1.0901
>> [2018-07-17 19:32:20.257921] I [subcmds(worker
>> /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn
successful.
>> Acknowledging back to monitor
>> [2018-07-17 19:32:20.274647] E [repce(agent
/urd-gds/gluster):114:worker]
>> <top>: call failed:
>> Traceback (most recent call last):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 110, in
>> worker
>>     res = getattr(self.obj, rmeth)(*in_data[2:])
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
>> line 37, in init
>>     return Changes.cl_init()
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
>> line 21, in __getattr__
>>     from libgfchangelog import Changes as LChanges
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>> line 17, in <module>
>>     class Changes(object):
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>> line 19, in Changes
>>     use_errno=True)
>>   File "/usr/lib64/python2.7/ctypes/__init__.py", line 360,
in __init__
>>     self._handle = _dlopen(self._name, mode)
>> OSError: libgfchangelog.so: cannot open shared object file: No such
file
>> or directory
>> [2018-07-17 19:32:20.275093] E [repce(worker
>> /urd-gds/gluster):206:__call__] RepceClient: call failed
>> call=6078:139982918485824:1531855940.27 method=init     error=OSError
>> [2018-07-17 19:32:20.275192] E [syncdutils(worker
>> /urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
>> Traceback (most recent call last):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py",
line 311,
>> in main
>>     func(args)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py",
line 72,
>> in subcmd_worker
>>     local.service_loop(remote)
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
>> 1236, in service_loop
>>     changelog_agent.init()
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 225, in
>> __call__
>>     return self.ins(self.meth, *a)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 207, in
>> __call__
>>     raise res
>> OSError: libgfchangelog.so: cannot open shared object file: No such
file
>> or directory
>> [2018-07-17 19:32:20.286787] I [repce(agent
/urd-gds/gluster):89:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor:
>> worker died in startup phase     brick=/urd-gds/gluster
>>
>>
>>
>> ------------------------------
>> *Fr?n:* gluster-users-bounces at gluster.org <gluster-users-bounces
at gluster
>> .org> f?r Marcus Peders?n <marcus.pedersen at slu.se>
>> *Skickat:* den 16 juli 2018 21:59
>> *Till:* khiremat at redhat.com
>>
>> *Kopia:* gluster-users at gluster.org
>> *?mne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not
>> work
>>
>>
>> Hi Kotresh,
>>
>> I have been testing for a bit and as you can see from the logs I sent
>> before permission is denied for geouser on slave node on file:
>>
>> /var/log/glusterfs/cli.log
>>
>> I have turned selinux off and just for testing I changed permissions on
>> /var/log/glusterfs/cli.log so geouser can access it.
>>
>> Starting geo-replication after that gives response successful but all
>> nodes get status Faulty.
>>
>>
>> If I run: gluster-mountbroker status
>>
>> I get:
>>
>> +-----------------------------+-------------+---------------
>> ------------+--------------+--------------------------+
>> |             NODE            | NODE STATUS |         MOUNT ROOT
>> |    GROUP     |          USERS           |
>> +-----------------------------+-------------+---------------
>> ------------+--------------+--------------------------+
>> | urd-gds-geo-001.hgen.slu.se |          UP | /var/mountbroker-root(OK)
>> | geogroup(OK) | geouser(urd-gds-volume)  |
>> |       urd-gds-geo-002       |          UP | /var/mountbroker-root(OK)
|
>> geogroup(OK) | geouser(urd-gds-volume)  |
>> |          localhost          |          UP | /var/mountbroker-root(OK)
|
>> geogroup(OK) | geouser(urd-gds-volume)  |
>> +-----------------------------+-------------+---------------
>> ------------+--------------+--------------------------+
>>
>>
>> and that is all nodes on slave cluster, so mountbroker seems ok.
>>
>>
>> gsyncd.log logs an error about /usr/local/sbin/gluster is missing.
>>
>> That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster
>>
>> Another error is that SSH between master and slave is broken,
>>
>> but now when I have changed permission on /var/log/glusterfs/cli.log I
>> can run:
>>
>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>> /var/lib/glusterd/geo-replication/secret.pem -p 22
>> geouser at urd-gds-geo-001 gluster --xml --remote-host=localhost volume
>> info urd-gds-volume
>>
>> as geouser and that works, which means that the ssh connection works.
>>
>>
>> Is the permissions on /var/log/glusterfs/cli.log changed when
>> geo-replication is setup?
>>
>> Is gluster supposed to be in /usr/local/sbin/gluster?
>>
>>
>> Do I have any options or should I remove current geo-replication and
>> create a new?
>>
>> How much do I need to clean up before creating a new geo-replication?
>>
>> In that case can I pause geo-replication, mount slave cluster on master
>> cluster and run rsync , just to speed up transfer of files?
>>
>>
>> Many thanks in advance!
>>
>> Marcus Peders?n
>>
>>
>> Part from the gsyncd.log:
>>
>> [2018-07-16 19:34:56.26287] E [syncdutils(worker
>> /urd-gds/gluster):749:errlog] Popen: command returned error    cmd=ssh
>> -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>> /var/lib/glusterd/geo-replicatio\
>> n/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-WrbZ22/bf6
>> 0c68f1a195dad59573a8dbaa309f2.sock geouser at urd-gds-geo-001
>> /nonexistent/gsyncd slave urd-gds-volume geouser at
urd-gds-geo-001::urd-g
>> ds-volu\
>> me --master-node urd-gds-001 --master-node-id
>> 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster
>> --local-node urd-gds-geo-000 --local-node-id
03075698-2bbf-43e4-a99a-65fe82f61794
>> --slave-timeo\
>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO
>> --slave-gluster-command-dir /usr/local/sbin/ error=1
>> [2018-07-16 19:34:56.26583] E [syncdutils(worker
>> /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of
>> "/usr/local/sbin/gluster" failed with ENOENT (No such file or
directory)
>> [2018-07-16 19:34:56.33901] I [repce(agent
/urd-gds/gluster):89:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor:
>> worker died before establishing connection       
brick=/urd-gds/gluster
>> [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker    brick=/urd-gds/gluster 
slave_node=urd-gds-geo-000
>> [2018-07-16 19:35:06.99509] I [gsyncd(worker
/urd-gds/gluster):297:main]
>> <top>: Using session config file      
path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main]
>> <top>: Using session config file       
path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-16 19:35:06.100481] I [changelogagent(agent
>> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>> [2018-07-16 19:35:06.108834] I [resource(worker
>> /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>> [2018-07-16 19:35:06.762320] E [syncdutils(worker
>> /urd-gds/gluster):303:log_raise_exception] <top>: connection to
peer is
>> broken
>> [2018-07-16 19:35:06.763103] E [syncdutils(worker
>> /urd-gds/gluster):749:errlog] Popen: command returned error   cmd=ssh
>> -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>> /var/lib/glusterd/geo-replicatio\
>> n/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-K9mB6Q/bf6
>> 0c68f1a195dad59573a8dbaa309f2.sock geouser at urd-gds-geo-001
>> /nonexistent/gsyncd slave urd-gds-volume geouser at
urd-gds-geo-001::urd-g
>> ds-volu\
>> me --master-node urd-gds-001 --master-node-id
>> 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster
>> --local-node urd-gds-geo-000 --local-node-id
03075698-2bbf-43e4-a99a-65fe82f61794
>> --slave-timeo\
>> ut 120 --slave-log-level INFO --slave-gluster-log-level INFO
>> --slave-gluster-command-dir /usr/local/sbin/ error=1
>> [2018-07-16 19:35:06.763398] E [syncdutils(worker
>> /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of
>> "/usr/local/sbin/gluster" failed with ENOENT (No such file or
directory)
>> [2018-07-16 19:35:06.771905] I [repce(agent
/urd-gds/gluster):89:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor:
>> worker died before establishing connection       brick=/urd-gds/gluster
>> [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker   brick=/urd-gds/gluster 
slave_node=urd-gds-geo-000
>> [2018-07-16 19:35:16.828056] I [gsyncd(worker
/urd-gds/gluster):297:main]
>> <top>: Using session config file     
path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-16 19:35:16.828066] I [gsyncd(agent
/urd-gds/gluster):297:main]
>> <top>: Using session config file      
path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-07-16 19:35:16.828912] I [changelogagent(agent
>> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>> [2018-07-16 19:35:16.837100] I [resource(worker
>> /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>> [2018-07-16 19:35:17.260257] E [syncdutils(worker
>> /urd-gds/gluster):303:log_raise_exception] <top>: connection to
peer is
>> broken
>>
>> ------------------------------
>> *Fr?n:* gluster-users-bounces at gluster.org <gluster-users-bounces
at gluster
>> .org> f?r Marcus Peders?n <marcus.pedersen at slu.se>
>> *Skickat:* den 13 juli 2018 14:50
>> *Till:* Kotresh Hiremath Ravishankar
>> *Kopia:* gluster-users at gluster.org
>> *?mne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not
>> work
>>
>> Hi Kotresh,
>> Yes, all nodes have the same version 4.1.1 both master and slave.
>> All glusterd are crashing on the master side.
>> Will send logs tonight.
>>
>> Thanks,
>> Marcus
>>
>> ################
>> Marcus Peders?n
>> Systemadministrator
>> Interbull Centre
>> ################
>> Sent from my phone
>> ################
>>
>> Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <
>> khiremat at redhat.com>:
>>
>> Hi Marcus,
>>
>> Is the gluster geo-rep version is same on both master and slave?
>>
>> Thanks,
>> Kotresh HR
>>
>> On Fri, Jul 13, 2018 at 1:26 AM, Marcus Peders?n <marcus.pedersen at
slu.se>
>> wrote:
>>
>> Hi Kotresh,
>>
>> i have replaced both files (gsyncdconfig.py
>>
<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py>
>> and repce.py
>>
<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>)
>> in all nodes both master and slave.
>>
>> I rebooted all servers but geo-replication status is still Stopped.
>>
>> I tried to start geo-replication with response Successful but status
>> still show Stopped on all nodes.
>>
>> Nothing has been written to geo-replication logs since I sent the tail
of
>> the log.
>>
>> So I do not know what info to provide?
>>
>>
>> Please, help me to find a way to solve this.
>>
>>
>> Thanks!
>>
>>
>> Regards
>>
>> Marcus
>>
>>
>> ------------------------------
>> *Fr?n:* gluster-users-bounces at gluster.org <gluster-users-bounces
at gluster
>> .org> f?r Marcus Peders?n <marcus.pedersen at slu.se>
>> *Skickat:* den 12 juli 2018 08:51
>> *Till:* Kotresh Hiremath Ravishankar
>> *Kopia:* gluster-users at gluster.org
>> *?mne:* Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not
>> work
>>
>> Thanks Kotresh,
>> I installed through the official centos channel,
centos-release-gluster41.
>> Isn't this fix included in centos install?
>> I will have a look, test it tonight and come back to you!
>>
>> Thanks a lot!
>>
>> Regards
>> Marcus
>>
>> ################
>> Marcus Peders?n
>> Systemadministrator
>> Interbull Centre
>> ################
>> Sent from my phone
>> ################
>>
>> Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <
>> khiremat at redhat.com>:
>>
>> Hi Marcus,
>>
>> I think the fix [1] is needed in 4.1
>> Could you please this out and let us know if that works for you?
>>
>> [1] https://review.gluster.org/#/c/20207/
>>
>> Thanks,
>> Kotresh HR
>>
>> On Thu, Jul 12, 2018 at 1:49 AM, Marcus Peders?n <marcus.pedersen at
slu.se>
>> wrote:
>>
>> Hi all,
>>
>> I have upgraded from 3.12.9 to 4.1.1 and been following upgrade
>> instructions for offline upgrade.
>>
>> I upgraded geo-replication side first 1 x (2+1) and the master side
after
>> that 2 x (2+1).
>>
>> Both clusters works the way they should on their own.
>>
>> After upgrade on master side status for all geo-replication nodes
>> is Stopped.
>>
>> I tried to start the geo-replication from master node and response back
>> was started successfully.
>>
>> Status again .... Stopped
>>
>> Tried to start again and get response started successfully, after that
>> all glusterd crashed on all master nodes.
>>
>> After a restart of all glusterd the master cluster was up again.
>>
>> Status for geo-replication is still Stopped and every try to start it
>> after this gives the response successful but still status Stopped.
>>
>>
>> Please help me get the geo-replication up and running again.
>>
>>
>> Best regards
>>
>> Marcus Peders?n
>>
>>
>> Part of geo-replication log from master node:
>>
>> [2018-07-11 18:42:48.941760] I
[changelogagent(/urd-gds/gluster):73:__init__]
>> ChangelogAgent: Agent listining...
>> [2018-07-11 18:42:48.947567] I
[resource(/urd-gds/gluster):1780:connect_remote]
>> SSH: Initializing SSH connection between master and slave...
>> [2018-07-11 18:42:49.363514] E
[syncdutils(/urd-gds/gluster):304:log_raise_exception]
>> <top>: connection to peer is broken
>> [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog]
>> Popen: command returned error    cmd=ssh -oPasswordAuthentication=no
>> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5
>> 534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000
>> /nonexistent/gsyncd --session-owner
5e94eb7d-219f-4741-a179-d4ae6b50c7ee
>> --local-id .%\
>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
>> gluster://localhost:urd-gds-volume   error=2
>> [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> usage: gsyncd.py [-h]
>> [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh>
>> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh>                  {monitor-status,monitor,worker
>> ,agent,slave,status,config-check,config-get,config-set,confi
>> g-reset,voluuidget,d\
>> elete}
>> [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh>                  ...
>> [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
>> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from
'monitor-status',
>> 'monit\
>> or', 'worker', 'agent', 'slave',
'status', 'config-check', 'config-get',
>> 'config-set', 'config-reset', 'voluuidget',
'delete')
>> [2018-07-11 18:42:49.365919] I
[syncdutils(/urd-gds/gluster):271:finalize]
>> <top>: exiting.
>> [2018-07-11 18:42:49.369316] I
[repce(/urd-gds/gluster):92:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-11 18:42:49.369921] I
[syncdutils(/urd-gds/gluster):271:finalize]
>> <top>: exiting.
>> [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor:
>> worker died before establishing connection       brick=/urd-gds/gluster
>> [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor:
>> starting gsyncd worker   brick=/urd-gds/gluster
>> slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost
>> :urd-gds-volume
>> [2018-07-11 18:42:59.558491] I
[resource(/urd-gds/gluster):1780:connect_remote]
>> SSH: Initializing SSH connection between master and slave...
>> [2018-07-11 18:42:59.559056] I
[changelogagent(/urd-gds/gluster):73:__init__]
>> ChangelogAgent: Agent listining...
>> [2018-07-11 18:42:59.945693] E
[syncdutils(/urd-gds/gluster):304:log_raise_exception]
>> <top>: connection to peer is broken
>> [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog]
>> Popen: command returned error    cmd=ssh -oPasswordAuthentication=no
>> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
>> .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5
>> 534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000
>> /nonexistent/gsyncd --session-owner
5e94eb7d-219f-4741-a179-d4ae6b50c7ee
>> --local-id .%\
>> 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
>> gluster://localhost:urd-gds-volume   error=2
>> [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> usage: gsyncd.py [-h]
>> [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh>
>> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh>                  {monitor-status,monitor,worker
>> ,agent,slave,status,config-check,config-get,config-set,confi
>> g-reset,voluuidget,d\
>> elete}
>> [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh>                  ...
>> [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr]
>> Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice:
>> '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from
'monitor-status',
>> 'monit\
>> or', 'worker', 'agent', 'slave',
'status', 'config-check', 'config-get',
>> 'config-set', 'config-reset', 'voluuidget',
'delete')
>> [2018-07-11 18:42:59.948046] I
[syncdutils(/urd-gds/gluster):271:finalize]
>> <top>: exiting.
>> [2018-07-11 18:42:59.951392] I
[repce(/urd-gds/gluster):92:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-07-11 18:42:59.951760] I
[syncdutils(/urd-gds/gluster):271:finalize]
>> <top>: exiting.
>> [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor:
>> worker died before establishing connection       brick=/urd-gds/gluster
>> [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor:
>> starting gsyncd worker    brick=/urd-gds/gluster
>> slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost
>> :urd-gds-volume
>> [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor:
>> Changelog Agent died, Aborting Worker     brick=/urd-gds/gluster
>> [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor:
>> worker died before establishing connection       
brick=/urd-gds/gluster
>> [2018-07-11 18:43:20.112435] I
[gsyncdstatus(monitor):242:set_worker_status]
>> GeorepStatus: Worker Status Change status=inconsistent
>> [2018-07-11 18:43:20.112885] E
[syncdutils(monitor):331:log_raise_exception]
>> <top>: FAIL:
>> Traceback (most recent call last):
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
>> 361, in twrap
>>     except:
>>   File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py",
line 428,
>> in wmon
>>     sys.exit()
>> TypeError: 'int' object is not iterable
>> [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize]
<top>:
>> exiting.
>>
>> ---
>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
>> personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For
more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>>
>> ---
>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
>> personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For
more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> ---
>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
>> personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For
more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>>
>> ---
>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
>> personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For
more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> ---
>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
>> personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For
more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> ---
>> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
>> personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For
more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>


-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180718/921bc2b5/attachment.html>

Marcus Pedersén

2018-Jul-18 10:37 UTC

head link

[Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Kotresh,

I ran:

#ldconfig /usr/lib

on all nodes in both clusters but I still get the same error.

What to do?


Output for:

# ldconfig -p /usr/lib | grep libgf

    libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
    libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
    libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
    libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
    libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0


I read somewere that you could change some settings for geo-replication to speed
up sync.

I can not remember where I saw that and what config parameters.

When geo-replication works I have 30TB on master cluster that has to be synced
to slave nodes,

and that will take a while before the slave nodes have catched up.


Thanks and regards

Marcus Peders?n


Part of gsyncd.log:

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207,
in __call__
    raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or
directory
[2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop]
RepceServer: terminating on reaching EOF.
[2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker
died in startup phase     brick=/urd-gds/gluster
[2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file     
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-18 10:24:03.335380] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-18 10:24:03.343605] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
[2018-07-18 10:24:04.881148] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established.        duration=1.5373
[2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect]
GLUSTER: Mounting gluster volume locally...
[2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect]
GLUSTER: Mounted gluster volume duration=1.0853
[2018-07-18 10:24:05.968028] I [subcmds(worker
/urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker]
<top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110,
in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 37, in init
    return Changes.cl_init()
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 21, in __getattr__
    from libgfchangelog import Changes as LChanges
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 17, in <module>
    class Changes(object):
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 19, in Changes
    use_errno=True)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in
__init__
    self._handle = _dlopen(self._name, mode)
OSError: libgfchangelog.so: cannot open shared object file: No such file or
directory
[2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__]
RepceClient: call failed   call=1146:139672481965888:1531909445.98 method=init  
error=OSError
[2018-07-18 10:24:05.984747] E [syncdutils(worker
/urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311,
in main
    func(args)
  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72,
in subcmd_worker
    local.service_loop(remote)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
1236, in service_loop
    changelog_agent.init()
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225,
in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207,
in __call__
    raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or
directory
[2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop]
RepceServer: terminating on reaching EOF.
[2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker
died in startup phase     brick=/urd-gds/gluster


________________________________
Fr?n: Kotresh Hiremath Ravishankar <khiremat at redhat.com>
Skickat: den 18 juli 2018 06:05
Till: Marcus Peders?n
Kopia: gluster-users at gluster.org
?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Marcus,

Well there is nothing wrong in setting up a symlink for gluster binary location,
but
there is a geo-rep command to set it so that gsyncd will search there.

To set on master
#gluster vol geo-rep <mastervol> <slave-vol> config
gluster-command-dir <gluster-binary-location>

To set on slave
#gluster vol geo-rep <mastervol> <slave-vol> config
slave-gluster-command-dir <gluster-binary-location>

Thanks,
Kotresh HR


On Wed, Jul 18, 2018 at 9:28 AM, Kotresh Hiremath Ravishankar <khiremat at
redhat.com<mailto:khiremat at redhat.com>> wrote:
Hi Marcus,

I am testing out 4.1 myself and I will have some update today.
For this particular traceback, gsyncd is not able to find the library.
Is it the rpm install? If so, gluster libraries would be in /usr/lib.
Please run the cmd below.

#ldconfig /usr/lib
#ldconfig -p /usr/lib | grep libgf  (This should list libgfchangelog.so)

Geo-rep should be fixed automatically.

Thanks,
Kotresh HR

On Wed, Jul 18, 2018 at 1:27 AM, Marcus Peders?n <marcus.pedersen at
slu.se<mailto:marcus.pedersen at slu.se>> wrote:

Hi again,

I continue to do some testing, but now I have come to a stage where I need help.


gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I
made a link.

After that /usr/local/sbin/glusterfs was missing so I made a link there as well.

Both links were done on all slave nodes.


Now I have a new error that I can not resolve myself.

It can not open libgfchangelog.so


Many thanks!

Regards

Marcus Peders?n


Part of gsyncd.log:

OSError: libgfchangelog.so: cannot open shared object file: No such file or
directory
[2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop]
RepceServer: terminating on reaching EOF.
[2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker
died in startup phase     brick=/urd-gds/gluster
[2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file     
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-17 19:32:17.542363] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-17 19:32:17.550894] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
[2018-07-17 19:32:19.166246] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established.        duration=1.6151
[2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect]
GLUSTER: Mounting gluster volume locally...
[2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect]
GLUSTER: Mounted gluster volume duration=1.0901
[2018-07-17 19:32:20.257921] I [subcmds(worker
/urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker]
<top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110,
in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 37, in init
    return Changes.cl_init()
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 21, in __getattr__
    from libgfchangelog import Changes as LChanges
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 17, in <module>
    class Changes(object):
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 19, in Changes
    use_errno=True)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in
__init__
    self._handle = _dlopen(self._name, mode)
OSError: libgfchangelog.so: cannot open shared object file: No such file or
directory
[2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__]
RepceClient: call failed   call=6078:139982918485824:1531855940.27 method=init  
error=OSError
[2018-07-17 19:32:20.275192] E [syncdutils(worker
/urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311,
in main
    func(args)
  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72,
in subcmd_worker
    local.service_loop(remote)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
1236, in service_loop
    changelog_agent.init()
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225,
in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207,
in __call__
    raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or
directory
[2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop]
RepceServer: terminating on reaching EOF.
[2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker
died in startup phase     brick=/urd-gds/gluster



________________________________
Fr?n: gluster-users-bounces at gluster.org<mailto:gluster-users-bounces at
gluster.org> <gluster-users-bounces at
gluster.org<mailto:gluster-users-bounces at gluster.org>> f?r Marcus
Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se>>
Skickat: den 16 juli 2018 21:59
Till: khiremat at redhat.com<mailto:khiremat at redhat.com>

Kopia: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work


Hi Kotresh,

I have been testing for a bit and as you can see from the logs I sent before
permission is denied for geouser on slave node on file:

/var/log/glusterfs/cli.log

I have turned selinux off and just for testing I changed permissions on
/var/log/glusterfs/cli.log so geouser can access it.

Starting geo-replication after that gives response successful but all nodes get
status Faulty.


If I run: gluster-mountbroker status

I get:

+-----------------------------+-------------+---------------------------+--------------+--------------------------+
|             NODE            | NODE STATUS |         MOUNT ROOT        |   
GROUP     |          USERS           |
+-----------------------------+-------------+---------------------------+--------------+--------------------------+
| urd-gds-geo-001.hgen.slu.se<http://urd-gds-geo-001.hgen.slu.se> |       
UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume)  |
|       urd-gds-geo-002       |          UP | /var/mountbroker-root(OK) |
geogroup(OK) | geouser(urd-gds-volume)  |
|          localhost          |          UP | /var/mountbroker-root(OK) |
geogroup(OK) | geouser(urd-gds-volume)  |
+-----------------------------+-------------+---------------------------+--------------+--------------------------+


and that is all nodes on slave cluster, so mountbroker seems ok.


gsyncd.log logs an error about /usr/local/sbin/gluster is missing.

That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster

Another error is that SSH between master and slave is broken,

but now when I have changed permission on /var/log/glusterfs/cli.log I can run:

ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 geouser at urd-gds-geo-001
gluster --xml --remote-host=localhost volume info urd-gds-volume

as geouser and that works, which means that the ssh connection works.


Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is
setup?

Is gluster supposed to be in /usr/local/sbin/gluster?


Do I have any options or should I remove current geo-replication and create a
new?

How much do I need to clean up before creating a new geo-replication?

In that case can I pause geo-replication, mount slave cluster on master cluster
and run rsync , just to speed up transfer of files?


Many thanks in advance!

Marcus Peders?n


Part from the gsyncd.log:

[2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog]
Popen: command returned error    cmd=ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
n/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at
urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume geouser at
urd-gds-geo-001::urd-gds-volu\
me --master-node urd-gds-001 --master-node-id
912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster
--local-node urd-gds-geo-000 --local-node-id
03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
ut 120 --slave-log-level INFO --slave-gluster-log-level INFO
--slave-gluster-command-dir /usr/local/sbin/ error=1
[2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr]
Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed
with ENOENT (No such file or directory)
[2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop]
RepceServer: terminating on reaching EOF.
[2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker
died before establishing connection        brick=/urd-gds/gluster
[2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting
gsyncd worker    brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file       
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:06.100481] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-16 19:35:06.108834] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
[2018-07-16 19:35:06.762320] E [syncdutils(worker
/urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is
broken
[2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog]
Popen: command returned error   cmd=ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
n/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at
urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume geouser at
urd-gds-geo-001::urd-gds-volu\
me --master-node urd-gds-001 --master-node-id
912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster
--local-node urd-gds-geo-000 --local-node-id
03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
ut 120 --slave-log-level INFO --slave-gluster-log-level INFO
--slave-gluster-command-dir /usr/local/sbin/ error=1
[2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr]
Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed
with ENOENT (No such file or directory)
[2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop]
RepceServer: terminating on reaching EOF.
[2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker
died before establishing connection       brick=/urd-gds/gluster
[2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file     
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:16.828912] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-16 19:35:16.837100] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
[2018-07-16 19:35:17.260257] E [syncdutils(worker
/urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is
broken


________________________________
Fr?n: gluster-users-bounces at gluster.org<mailto:gluster-users-bounces at
gluster.org> <gluster-users-bounces at
gluster.org<mailto:gluster-users-bounces at gluster.org>> f?r Marcus
Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se>>
Skickat: den 13 juli 2018 14:50
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Kotresh,
Yes, all nodes have the same version 4.1.1 both master and slave.
All glusterd are crashing on the master side.
Will send logs tonight.

Thanks,
Marcus

################
Marcus Peders?n
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <khiremat at
redhat.com<mailto:khiremat at redhat.com>>:
Hi Marcus,

Is the gluster geo-rep version is same on both master and slave?

Thanks,
Kotresh HR

On Fri, Jul 13, 2018 at 1:26 AM, Marcus Peders?n <marcus.pedersen at
slu.se<mailto:marcus.pedersen at slu.se>> wrote:

Hi Kotresh,

i have replaced both files
(gsyncdconfig.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py>
and
repce.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>)
in all nodes both master and slave.

I rebooted all servers but geo-replication status is still Stopped.

I tried to start geo-replication with response Successful but status still show
Stopped on all nodes.

Nothing has been written to geo-replication logs since I sent the tail of the
log.

So I do not know what info to provide?


Please, help me to find a way to solve this.


Thanks!


Regards

Marcus


________________________________
Fr?n: gluster-users-bounces at gluster.org<mailto:gluster-users-bounces at
gluster.org> <gluster-users-bounces at
gluster.org<mailto:gluster-users-bounces at gluster.org>> f?r Marcus
Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se>>
Skickat: den 12 juli 2018 08:51
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Thanks Kotresh,
I installed through the official centos channel, centos-release-gluster41.
Isn't this fix included in centos install?
I will have a look, test it tonight and come back to you!

Thanks a lot!

Regards
Marcus

################
Marcus Peders?n
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <khiremat at
redhat.com<mailto:khiremat at redhat.com>>:
Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus Peders?n <marcus.pedersen at
slu.se<mailto:marcus.pedersen at slu.se>> wrote:

Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for
offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 2
x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was
started successfully.

Status again .... Stopped

Tried to start again and get response started successfully, after that all
glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after this
gives the response successful but still status Stopped.


Please help me get the geo-replication up and running again.


Best regards

Marcus Peders?n


Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__]
ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote]
SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E
[syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection
to peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen:
command returned error    cmd=ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock geouser at
urd-gds-geo-000 /nonexistent/gsyncd --session-owner
5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
gluster://localhost:urd-gds-volume   error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen:
ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen:
ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen:
ssh>                 
{monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen:
ssh>                  ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen:
ssh> gsyncd.py: error: argument subcmd: invalid choice:
'5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from
'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status',
'config-check', 'config-get', 'config-set',
'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize]
<top>: exiting.
[2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop]
RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize]
<top>: exiting.
[2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker
died before establishing connection       brick=/urd-gds/gluster
[2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting
gsyncd worker   brick=/urd-gds/gluster  slave_node=ssh://geouser at
urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote]
SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__]
ChangelogAgent: Agent listining...
[2018-07-11 18:42:59.945693] E
[syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection
to peer is broken
[2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen:
command returned error    cmd=ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock geouser at
urd-gds-geo-000 /nonexistent/gsyncd --session-owner
5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120
gluster://localhost:urd-gds-volume   error=2
[2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen:
ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen:
ssh>
[2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen:
ssh>                 
{monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen:
ssh>                  ...
[2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen:
ssh> gsyncd.py: error: argument subcmd: invalid choice:
'5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from
'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status',
'config-check', 'config-get', 'config-set',
'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize]
<top>: exiting.
[2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop]
RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize]
<top>: exiting.
[2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker
died before establishing connection       brick=/urd-gds/gluster
[2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting
gsyncd worker    brick=/urd-gds/gluster  slave_node=ssh://geouser at
urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog
Agent died, Aborting Worker     brick=/urd-gds/gluster
[2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker
died before establishing connection        brick=/urd-gds/gluster
[2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status]
GeorepStatus: Worker Status Change status=inconsistent
[2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception]
<top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
361, in twrap
    except:
  File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line
428, in wmon
    sys.exit()
TypeError: 'int' object is not iterable
[2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>:
exiting.

---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Thanks and Regards,
Kotresh H R


---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>



--
Thanks and Regards,
Kotresh H R


---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>



--
Thanks and Regards,
Kotresh H R



--
Thanks and Regards,
Kotresh H R

---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180718/d8c956c2/attachment.html>

Gluster users - Jul 2018 - Upgrade to 4.1.1 geo-replication does not work

[Gluster-users] Upgrade to 4.1.1 geo-replication does not work

[Gluster-users] Upgrade to 4.1.1 geo-replication does not work