Marcus Pedersén
2018-Jul-13 12:50 UTC
[Gluster-users] Upgrade to 4.1.1 geo-replication does not work
Hi Kotresh, Yes, all nodes have the same version 4.1.1 both master and slave. All glusterd are crashing on the master side. Will send logs tonight. Thanks, Marcus ################ Marcus Peders?n Systemadministrator Interbull Centre ################ Sent from my phone ################ Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: Hi Marcus, Is the gluster geo-rep version is same on both master and slave? Thanks, Kotresh HR On Fri, Jul 13, 2018 at 1:26 AM, Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote: Hi Kotresh, i have replaced both files (gsyncdconfig.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py> and repce.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>) in all nodes both master and slave. I rebooted all servers but geo-replication status is still Stopped. I tried to start geo-replication with response Successful but status still show Stopped on all nodes. Nothing has been written to geo-replication logs since I sent the tail of the log. So I do not know what info to provide? Please, help me to find a way to solve this. Thanks! Regards Marcus ________________________________ Fr?n: gluster-users-bounces at gluster.org<mailto:gluster-users-bounces at gluster.org> <gluster-users-bounces at gluster.org<mailto:gluster-users-bounces at gluster.org>> f?r Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> Skickat: den 12 juli 2018 08:51 Till: Kotresh Hiremath Ravishankar Kopia: gluster-users at gluster.org<mailto:gluster-users at gluster.org> ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work Thanks Kotresh, I installed through the official centos channel, centos-release-gluster41. Isn't this fix included in centos install? I will have a look, test it tonight and come back to you! Thanks a lot! Regards Marcus ################ Marcus Peders?n Systemadministrator Interbull Centre ################ Sent from my phone ################ Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com<mailto:khiremat at redhat.com>>: Hi Marcus, I think the fix [1] is needed in 4.1 Could you please this out and let us know if that works for you? [1] https://review.gluster.org/#/c/20207/ Thanks, Kotresh HR On Thu, Jul 12, 2018 at 1:49 AM, Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote: Hi all, I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade. I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1). Both clusters works the way they should on their own. After upgrade on master side status for all geo-replication nodes is Stopped. I tried to start the geo-replication from master node and response back was started successfully. Status again .... Stopped Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes. After a restart of all glusterd the master cluster was up again. Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped. Please help me get the geo-replication up and running again. Best regards Marcus Peders?n Part of geo-replication log from master node: [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining... [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\ .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\ 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2 [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h] [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\ elete} [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ... [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\ or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete') [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF. [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost:urd-gds-volume [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining... [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\ .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\ 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2 [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h] [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\ elete} [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ... [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\ or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete') [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF. [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost:urd-gds-volume [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap except: File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon sys.exit() TypeError: 'int' object is not iterable [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting. --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks and Regards, Kotresh H R --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> -- Thanks and Regards, Kotresh H R --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180713/24130bf5/attachment-0001.html>
Marcus Pedersén
2018-Jul-13 19:30 UTC
[Gluster-users] Upgrade to 4.1.1 geo-replication does not work
Hi again, I made a mistake when replacing the python files, I missed selinux context, I fixed this but it makes no difference. All nodes in geo-replication is still in status Stopped, and at a start response is successful but still status Stopped. I enclose glusterd.log and gsyncd.log and hope that this can give something. Many thanks for your help! Best regards Marcus Pedes?n ________________________________ Fr?n: Marcus Peders?n Skickat: den 13 juli 2018 14:50 Till: Kotresh Hiremath Ravishankar Kopia: gluster-users at gluster.org ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work Hi Kotresh, Yes, all nodes have the same version 4.1.1 both master and slave. All glusterd are crashing on the master side. Will send logs tonight. Thanks, Marcus ################ Marcus Peders?n Systemadministrator Interbull Centre ################ Sent from my phone ################ Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: Hi Marcus, Is the gluster geo-rep version is same on both master and slave? Thanks, Kotresh HR On Fri, Jul 13, 2018 at 1:26 AM, Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote: Hi Kotresh, i have replaced both files (gsyncdconfig.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py> and repce.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>) in all nodes both master and slave. I rebooted all servers but geo-replication status is still Stopped. I tried to start geo-replication with response Successful but status still show Stopped on all nodes. Nothing has been written to geo-replication logs since I sent the tail of the log. So I do not know what info to provide? Please, help me to find a way to solve this. Thanks! Regards Marcus ________________________________ Fr?n: gluster-users-bounces at gluster.org<mailto:gluster-users-bounces at gluster.org> <gluster-users-bounces at gluster.org<mailto:gluster-users-bounces at gluster.org>> f?r Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> Skickat: den 12 juli 2018 08:51 Till: Kotresh Hiremath Ravishankar Kopia: gluster-users at gluster.org<mailto:gluster-users at gluster.org> ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work Thanks Kotresh, I installed through the official centos channel, centos-release-gluster41. Isn't this fix included in centos install? I will have a look, test it tonight and come back to you! Thanks a lot! Regards Marcus ################ Marcus Peders?n Systemadministrator Interbull Centre ################ Sent from my phone ################ Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com<mailto:khiremat at redhat.com>>: Hi Marcus, I think the fix [1] is needed in 4.1 Could you please this out and let us know if that works for you? [1] https://review.gluster.org/#/c/20207/ Thanks, Kotresh HR On Thu, Jul 12, 2018 at 1:49 AM, Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote: Hi all, I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade. I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1). Both clusters works the way they should on their own. After upgrade on master side status for all geo-replication nodes is Stopped. I tried to start the geo-replication from master node and response back was started successfully. Status again .... Stopped Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes. After a restart of all glusterd the master cluster was up again. Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped. Please help me get the geo-replication up and running again. Best regards Marcus Peders?n Part of geo-replication log from master node: [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining... [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\ .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\ 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2 [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h] [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\ elete} [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ... [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\ or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete') [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF. [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost:urd-gds-volume [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining... [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\ .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\ 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2 [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h] [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\ elete} [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ... [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\ or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete') [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF. [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost:urd-gds-volume [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap except: File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon sys.exit() TypeError: 'int' object is not iterable [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting. --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks and Regards, Kotresh H R --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> -- Thanks and Regards, Kotresh H R --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180713/5d4efde3/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: glusterd.log Type: text/x-log Size: 658457 bytes Desc: glusterd.log URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180713/5d4efde3/attachment-0002.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: gsyncd.log Type: text/x-log Size: 81246 bytes Desc: gsyncd.log URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180713/5d4efde3/attachment-0003.bin>
Marcus Pedersén
2018-Jul-16 19:59 UTC
[Gluster-users] Upgrade to 4.1.1 geo-replication does not work
Hi Kotresh, I have been testing for a bit and as you can see from the logs I sent before permission is denied for geouser on slave node on file: /var/log/glusterfs/cli.log I have turned selinux off and just for testing I changed permissions on /var/log/glusterfs/cli.log so geouser can access it. Starting geo-replication after that gives response successful but all nodes get status Faulty. If I run: gluster-mountbroker status I get: +-----------------------------+-------------+---------------------------+--------------+--------------------------+ | NODE | NODE STATUS | MOUNT ROOT | GROUP | USERS | +-----------------------------+-------------+---------------------------+--------------+--------------------------+ | urd-gds-geo-001.hgen.slu.se | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) | | urd-gds-geo-002 | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) | | localhost | UP | /var/mountbroker-root(OK) | geogroup(OK) | geouser(urd-gds-volume) | +-----------------------------+-------------+---------------------------+--------------+--------------------------+ and that is all nodes on slave cluster, so mountbroker seems ok. gsyncd.log logs an error about /usr/local/sbin/gluster is missing. That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster Another error is that SSH between master and slave is broken, but now when I have changed permission on /var/log/glusterfs/cli.log I can run: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 geouser at urd-gds-geo-001 gluster --xml --remote-host=localhost volume info urd-gds-volume as geouser and that works, which means that the ssh connection works. Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication is setup? Is gluster supposed to be in /usr/local/sbin/gluster? Do I have any options or should I remove current geo-replication and create a new? How much do I need to clean up before creating a new geo-replication? In that case can I pause geo-replication, mount slave cluster on master cluster and run rsync , just to speed up transfer of files? Many thanks in advance! Marcus Peders?n Part from the gsyncd.log: [2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\ n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume geouser at urd-gds-geo-001::urd-gds-volu\ me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\ ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1 [2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory) [2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. [2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 [2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf [2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf [2018-07-16 19:35:06.100481] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... [2018-07-16 19:35:06.108834] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-07-16 19:35:06.762320] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken [2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\ n/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume geouser at urd-gds-geo-001::urd-gds-volu\ me --master-node urd-gds-001 --master-node-id 912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster --local-node urd-gds-geo-000 --local-node-id 03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\ ut 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/local/sbin/ error=1 [2018-07-16 19:35:06.763398] E [syncdutils(worker /urd-gds/gluster):753:logerr] Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT (No such file or directory) [2018-07-16 19:35:06.771905] I [repce(agent /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF. [2018-07-16 19:35:06.772272] I [monitor(monitor):262:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-16 19:35:16.786387] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 [2018-07-16 19:35:16.828056] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf [2018-07-16 19:35:16.828066] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf [2018-07-16 19:35:16.828912] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... [2018-07-16 19:35:16.837100] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-07-16 19:35:17.260257] E [syncdutils(worker /urd-gds/gluster):303:log_raise_exception] <top>: connection to peer is broken ________________________________ Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> Skickat: den 13 juli 2018 14:50 Till: Kotresh Hiremath Ravishankar Kopia: gluster-users at gluster.org ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work Hi Kotresh, Yes, all nodes have the same version 4.1.1 both master and slave. All glusterd are crashing on the master side. Will send logs tonight. Thanks, Marcus ################ Marcus Peders?n Systemadministrator Interbull Centre ################ Sent from my phone ################ Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: Hi Marcus, Is the gluster geo-rep version is same on both master and slave? Thanks, Kotresh HR On Fri, Jul 13, 2018 at 1:26 AM, Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote: Hi Kotresh, i have replaced both files (gsyncdconfig.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py> and repce.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>) in all nodes both master and slave. I rebooted all servers but geo-replication status is still Stopped. I tried to start geo-replication with response Successful but status still show Stopped on all nodes. Nothing has been written to geo-replication logs since I sent the tail of the log. So I do not know what info to provide? Please, help me to find a way to solve this. Thanks! Regards Marcus ________________________________ Fr?n: gluster-users-bounces at gluster.org<mailto:gluster-users-bounces at gluster.org> <gluster-users-bounces at gluster.org<mailto:gluster-users-bounces at gluster.org>> f?r Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> Skickat: den 12 juli 2018 08:51 Till: Kotresh Hiremath Ravishankar Kopia: gluster-users at gluster.org<mailto:gluster-users at gluster.org> ?mne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work Thanks Kotresh, I installed through the official centos channel, centos-release-gluster41. Isn't this fix included in centos install? I will have a look, test it tonight and come back to you! Thanks a lot! Regards Marcus ################ Marcus Peders?n Systemadministrator Interbull Centre ################ Sent from my phone ################ Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com<mailto:khiremat at redhat.com>>: Hi Marcus, I think the fix [1] is needed in 4.1 Could you please this out and let us know if that works for you? [1] https://review.gluster.org/#/c/20207/ Thanks, Kotresh HR On Thu, Jul 12, 2018 at 1:49 AM, Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote: Hi all, I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions for offline upgrade. I upgraded geo-replication side first 1 x (2+1) and the master side after that 2 x (2+1). Both clusters works the way they should on their own. After upgrade on master side status for all geo-replication nodes is Stopped. I tried to start the geo-replication from master node and response back was started successfully. Status again .... Stopped Tried to start again and get response started successfully, after that all glusterd crashed on all master nodes. After a restart of all glusterd the master cluster was up again. Status for geo-replication is still Stopped and every try to start it after this gives the response successful but still status Stopped. Please help me get the geo-replication up and running again. Best regards Marcus Peders?n Part of geo-replication log from master node: [2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining... [2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-07-11 18:42:49.363514] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken [2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\ .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\ 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2 [2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h] [2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> [2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\ elete} [2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ... [2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\ or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete') [2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF. [2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost:urd-gds-volume [2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] ChangelogAgent: Agent listining... [2018-07-11 18:42:59.945693] E [syncdutils(/urd-gds/gluster):304:log_raise_exception] <top>: connection to peer is broken [2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\ .pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock geouser at urd-gds-geo-000 /nonexistent/gsyncd --session-owner 5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\ 2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 gluster://localhost:urd-gds-volume error=2 [2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> usage: gsyncd.py [-h] [2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> [2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> {monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\ elete} [2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> ... [2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):214:logerr] Popen: ssh> gsyncd.py: error: argument subcmd: invalid choice: '5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\ or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 'config-set', 'config-reset', 'voluuidget', 'delete') [2018-07-11 18:42:59.948046] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:59.951392] I [repce(/urd-gds/gluster):92:service_loop] RepceServer: terminating on reaching EOF. [2018-07-11 18:42:59.951760] I [syncdutils(/urd-gds/gluster):271:finalize] <top>: exiting. [2018-07-11 18:42:59.951817] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-11 18:43:10.54580] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=ssh://geouser at urd-gds-geo-000:gluster://localhost:urd-gds-volume [2018-07-11 18:43:10.88356] I [monitor(monitor):345:monitor] Monitor: Changelog Agent died, Aborting Worker brick=/urd-gds/gluster [2018-07-11 18:43:10.88613] I [monitor(monitor):353:monitor] Monitor: worker died before establishing connection brick=/urd-gds/gluster [2018-07-11 18:43:20.112435] I [gsyncdstatus(monitor):242:set_worker_status] GeorepStatus: Worker Status Change status=inconsistent [2018-07-11 18:43:20.112885] E [syncdutils(monitor):331:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361, in twrap except: File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 428, in wmon sys.exit() TypeError: 'int' object is not iterable [2018-07-11 18:43:20.114610] I [syncdutils(monitor):271:finalize] <top>: exiting. --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks and Regards, Kotresh H R --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> -- Thanks and Regards, Kotresh H R --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180716/edf68eab/attachment.html>