Christos Tsalidis
2018-Oct-06 08:58 UTC
[Gluster-users] gluster geo-replication rsync error 3
Hi all, I am testing a gluster geo-replication setup in glusterfs 3.12.14 version on CentOS Linux release 7.5.1804 and getting a faulty session due to rsync. It returns error 3. After I start the session, it goes from initializing, then to active and finally to faulty. Here is what I can see in logs. cat /var/log/glusterfs/geo-replication/mastervol/ssh%3A%2F%2Fgeoaccount%4010.0.2.13%3Agluster%3A%2F%2F127.0.0.1%3Aslavevol.log [2018-10-06 08:55:02.246958] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/bricks/brick-a1/brick slave_node=ssh://geoaccount at servere:gluster://localhost:slavevol [2018-10-06 08:55:02.503489] I [resource(/bricks/brick-a1/brick):1780:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-10-06 08:55:02.515492] I [changelogagent(/bricks/brick-a1/brick):73:__init__] ChangelogAgent: Agent listining... [2018-10-06 08:55:04.571449] I [resource(/bricks/brick-a1/brick):1787:connect_remote] SSH: SSH connection between master and slave established. duration=2.0676 [2018-10-06 08:55:04.571890] I [resource(/bricks/brick-a1/brick):1502:connect] GLUSTER: Mounting gluster volume locally... [2018-10-06 08:55:05.693440] I [resource(/bricks/brick-a1/brick):1515:connect] GLUSTER: Mounted gluster volume duration=1.1212 [2018-10-06 08:55:05.693741] I [gsyncd(/bricks/brick-a1/brick):799:main_i] <top>: Closing feedback fd, waking up the monitor [2018-10-06 08:55:07.711970] I [master(/bricks/brick-a1/brick):1518:register] _GMaster: Working dir path=/var/lib/misc/glusterfsd/mastervol/ssh%3A%2F%2Fgeoaccount%4010.0.2.13%3Agluster%3A%2F%2F127.0.0.1%3Aslavevol/9517ac67e25c7491f03ba5e2506505bd [2018-10-06 08:55:07.712357] I [resource(/bricks/brick-a1/brick):1662:service_loop] GLUSTER: Register time time=1538816107 [2018-10-06 08:55:07.764151] I [master(/bricks/brick-a1/brick):490:mgmt_lock] _GMaster: Got lock Becoming ACTIVE brick=/bricks/brick-a1/brick [2018-10-06 08:55:07.768949] I [gsyncdstatus(/bricks/brick-a1/brick):276:set_active] GeorepStatus: Worker Status Change status=Active [2018-10-06 08:55:07.770529] I [gsyncdstatus(/bricks/brick-a1/brick):248:set_worker_crawl_status] GeorepStatus: Crawl Status Changestatus=History Crawl [2018-10-06 08:55:07.770975] I [master(/bricks/brick-a1/brick):1432:crawl] _GMaster: starting history crawl turns=1 stime=(1538745843, 0) entry_stime=None etime=1538816107 [2018-10-06 08:55:08.773402] I [master(/bricks/brick-a1/brick):1461:crawl] _GMaster: slave's time stime=(1538745843, 0) [2018-10-06 08:55:09.262964] I [master(/bricks/brick-a1/brick):1863:syncjob] Syncer: Sync Time Taken duration=0.0606 num_files=1job=2 return_code=3 [2018-10-06 08:55:09.263253] E [resource(/bricks/brick-a1/brick):210:errlog] Popen: command returned error cmd=rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --existing --xattrs --acls --ignore-missing-args . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-wVbxGU/05b8d7b5dab75575689c0e1a2ec33b3f.sock --compress geoaccount at servere:/proc/12335/cwd error=3 [2018-10-06 08:55:09.275593] I [syncdutils(/bricks/brick-a1/brick):271:finalize] <top>: exiting. [2018-10-06 08:55:09.279442] I [repce(/bricks/brick-a1/brick):92:service_loop] RepceServer: terminating on reaching EOF. [2018-10-06 08:55:09.279936] I [syncdutils(/bricks/brick-a1/brick):271:finalize] <top>: exiting. [2018-10-06 08:55:09.698153] I [monitor(monitor):363:monitor] Monitor: worker died in startup phase brick=/bricks/brick-a1/brick [2018-10-06 08:55:09.707330] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2018-10-06 08:55:19.888017] I [monitor(monitor):280:monitor] Monitor: starting gsyncd worker brick=/bricks/brick-a1/brick slave_node=ssh://geoaccount at servere:gluster://localhost:slavevol [2018-10-06 08:55:20.140819] I [resource(/bricks/brick-a1/brick):1780:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-10-06 08:55:20.141815] I [changelogagent(/bricks/brick-a1/brick):73:__init__] ChangelogAgent: Agent listining... [2018-10-06 08:55:22.245625] I [resource(/bricks/brick-a1/brick):1787:connect_remote] SSH: SSH connection between master and slave established. duration=2.1046 [2018-10-06 08:55:22.246062] I [resource(/bricks/brick-a1/brick):1502:connect] GLUSTER: Mounting gluster volume locally... [2018-10-06 08:55:23.370100] I [resource(/bricks/brick-a1/brick):1515:connect] GLUSTER: Mounted gluster volume duration=1.1238 [2018-10-06 08:55:23.370507] I [gsyncd(/bricks/brick-a1/brick):799:main_i] <top>: Closing feedback fd, waking up the monitor [2018-10-06 08:55:25.388721] I [master(/bricks/brick-a1/brick):1518:register] _GMaster: Working dir path=/var/lib/misc/glusterfsd/mastervol/ssh%3A%2F%2Fgeoaccount%4010.0.2.13%3Agluster%3A%2F%2F127.0.0.1%3Aslavevol/9517ac67e25c7491f03ba5e2506505bd [2018-10-06 08:55:25.388978] I [resource(/bricks/brick-a1/brick):1662:service_loop] GLUSTER: Register time time=1538816125 [2018-10-06 08:55:25.405546] I [master(/bricks/brick-a1/brick):490:mgmt_lock] _GMaster: Got lock Becoming ACTIVE brick=/bricks/brick-a1/brick [2018-10-06 08:55:25.408958] I [gsyncdstatus(/bricks/brick-a1/brick):276:set_active] GeorepStatus: Worker Status Change status=Active [2018-10-06 08:55:25.410522] I [gsyncdstatus(/bricks/brick-a1/brick):248:set_worker_crawl_status] GeorepStatus: Crawl Status Changestatus=History Crawl [2018-10-06 08:55:25.411005] I [master(/bricks/brick-a1/brick):1432:crawl] _GMaster: starting history crawl turns=1 stime=(1538745843, 0) entry_stime=None etime=1538816125 [2018-10-06 08:55:26.413892] I [master(/bricks/brick-a1/brick):1461:crawl] _GMaster: slave's time stime=(1538745843, 0) [2018-10-06 08:55:26.933149] I [master(/bricks/brick-a1/brick):1863:syncjob] Syncer: Sync Time Taken duration=0.0549 num_files=1job=3 return_code=3 [2018-10-06 08:55:26.933419] E [resource(/bricks/brick-a1/brick):210:errlog] Popen: command returned error cmd=rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --existing --xattrs --acls --ignore-missing-args . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-Oq_aPL/05b8d7b5dab75575689c0e1a2ec33b3f.sock --compress geoaccount at servere:/proc/12489/cwd error=3 [2018-10-06 08:55:26.953044] I [syncdutils(/bricks/brick-a1/brick):271:finalize] <top>: exiting. [2018-10-06 08:55:26.956691] I [repce(/bricks/brick-a1/brick):92:service_loop] RepceServer: terminating on reaching EOF. [2018-10-06 08:55:26.957233] I [syncdutils(/bricks/brick-a1/brick):271:finalize] <top>: exiting. [2018-10-06 08:55:27.378103] I [monitor(monitor):363:monitor] Monitor: worker died in startup phase brick=/bricks/brick-a1/brick [2018-10-06 08:55:27.382554] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [root at servera ~]# [root at servera ~]# gluster volume info mastervol Volume Name: mastervol Type: Replicate Volume ID: b7ec0647-b101-4240-9abf-32f24f2decec Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: servera:/bricks/brick-a1/brick Brick2: serverb:/bricks/brick-b1/brick Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet geo-replication.indexing: on geo-replication.ignore-pid-check: on changelog.changelog: on cluster.enable-shared-storage: enable [root at servere ~]# gluster volume info slavevol Volume Name: slavevol Type: Replicate Volume ID: 8b431b4e-5dc4-4db6-9608-3b82cce5024c Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: servere:/bricks/brick-e1/brick Brick2: servere:/bricks/brick-e2/brick Options Reconfigured: features.read-only: off performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.quick-read: off Do you have any idea how can I solve this? Many thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181006/50ca15e1/attachment.html>
Hi Christos, few month ago i had? a similar problem but on ubuntu 16.04. At that time Kotresh gave me a hint : https://www.spinics.net/lists/gluster-users/msg33694.html gluster volume geo-replication <mastervol> <slavehost>::<slavevol> config access_mount true this hint solved my problem on ubuntu 16.04. hope that helps... best regards Dietmar On 06.10.2018 10:58, Christos Tsalidis wrote:> Hi all, > > I am testing a gluster geo-replication setup in glusterfs 3.12.14 > version on CentOS Linux release 7.5.1804 and getting a faulty session > due to rsync. It returns error 3. > > After I start the session, it goes from initializing, then to active > and finally to faulty. > Here is what I can see in logs. > > cat > /var/log/glusterfs/geo-replication/mastervol/ssh%3A%2F%2Fgeoaccount%4010.0.2.13%3Agluster%3A%2F%2F127.0.0.1%3Aslavevol.log > > [2018-10-06 08:55:02.246958] I [monitor(monitor):280:monitor] Monitor: > starting gsyncd worker?? brick=/bricks/brick-a1/brick > slave_node=ssh://geoaccount at servere:gluster://localhost:slavevol > [2018-10-06 08:55:02.503489] I > [resource(/bricks/brick-a1/brick):1780:connect_remote] SSH: > Initializing SSH connection between master and slave... > [2018-10-06 08:55:02.515492] I > [changelogagent(/bricks/brick-a1/brick):73:__init__] ChangelogAgent: > Agent listining... > [2018-10-06 08:55:04.571449] I > [resource(/bricks/brick-a1/brick):1787:connect_remote] SSH: SSH > connection between master and slave established.??? duration=2.0676 > [2018-10-06 08:55:04.571890] I > [resource(/bricks/brick-a1/brick):1502:connect] GLUSTER: Mounting > gluster volume locally... > [2018-10-06 08:55:05.693440] I > [resource(/bricks/brick-a1/brick):1515:connect] GLUSTER: Mounted > gluster volume? duration=1.1212 > [2018-10-06 08:55:05.693741] I > [gsyncd(/bricks/brick-a1/brick):799:main_i] <top>: Closing feedback > fd, waking up the monitor > [2018-10-06 08:55:07.711970] I > [master(/bricks/brick-a1/brick):1518:register] _GMaster: Working dir > path=/var/lib/misc/glusterfsd/mastervol/ssh%3A%2F%2Fgeoaccount%4010.0.2.13%3Agluster%3A%2F%2F127.0.0.1%3Aslavevol/9517ac67e25c7491f03ba5e2506505bd > [2018-10-06 08:55:07.712357] I > [resource(/bricks/brick-a1/brick):1662:service_loop] GLUSTER: Register > time????? time=1538816107 > [2018-10-06 08:55:07.764151] I > [master(/bricks/brick-a1/brick):490:mgmt_lock] _GMaster: Got lock > Becoming ACTIVE brick=/bricks/brick-a1/brick > [2018-10-06 08:55:07.768949] I > [gsyncdstatus(/bricks/brick-a1/brick):276:set_active] GeorepStatus: > Worker Status Change status=Active > [2018-10-06 08:55:07.770529] I > [gsyncdstatus(/bricks/brick-a1/brick):248:set_worker_crawl_status] > GeorepStatus: Crawl Status Changestatus=History Crawl > [2018-10-06 08:55:07.770975] I > [master(/bricks/brick-a1/brick):1432:crawl] _GMaster: starting history > crawl???? turns=1 stime=(1538745843, 0)????? entry_stime=None??????? > etime=1538816107 > [2018-10-06 08:55:08.773402] I > [master(/bricks/brick-a1/brick):1461:crawl] _GMaster: slave's > time?????? stime=(1538745843, 0) > [2018-10-06 08:55:09.262964] I > [master(/bricks/brick-a1/brick):1863:syncjob] Syncer: Sync Time > Taken??? duration=0.0606 num_files=1job=2 return_code=3 > [2018-10-06 08:55:09.263253] E > [resource(/bricks/brick-a1/brick):210:errlog] Popen: command returned > error????? cmd=rsync -aR0 --inplace --files-from=- --super --stats > --numeric-ids --no-implied-dirs --existing --xattrs --acls > --ignore-missing-args . -e ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem -p 22 > -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-wVbxGU/05b8d7b5dab75575689c0e1a2ec33b3f.sock > --compress geoaccount at servere:/proc/12335/cwd error=3 > [2018-10-06 08:55:09.275593] I > [syncdutils(/bricks/brick-a1/brick):271:finalize] <top>: exiting. > [2018-10-06 08:55:09.279442] I > [repce(/bricks/brick-a1/brick):92:service_loop] RepceServer: > terminating on reaching EOF. > [2018-10-06 08:55:09.279936] I > [syncdutils(/bricks/brick-a1/brick):271:finalize] <top>: exiting. > [2018-10-06 08:55:09.698153] I [monitor(monitor):363:monitor] Monitor: > worker died in startup phase???? brick=/bricks/brick-a1/brick > [2018-10-06 08:55:09.707330] I > [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker > Status Change status=Faulty > [2018-10-06 08:55:19.888017] I [monitor(monitor):280:monitor] Monitor: > starting gsyncd worker?? brick=/bricks/brick-a1/brick > slave_node=ssh://geoaccount at servere:gluster://localhost:slavevol > [2018-10-06 08:55:20.140819] I > [resource(/bricks/brick-a1/brick):1780:connect_remote] SSH: > Initializing SSH connection between master and slave... > [2018-10-06 08:55:20.141815] I > [changelogagent(/bricks/brick-a1/brick):73:__init__] ChangelogAgent: > Agent listining... > [2018-10-06 08:55:22.245625] I > [resource(/bricks/brick-a1/brick):1787:connect_remote] SSH: SSH > connection between master and slave established.??? duration=2.1046 > [2018-10-06 08:55:22.246062] I > [resource(/bricks/brick-a1/brick):1502:connect] GLUSTER: Mounting > gluster volume locally... > [2018-10-06 08:55:23.370100] I > [resource(/bricks/brick-a1/brick):1515:connect] GLUSTER: Mounted > gluster volume? duration=1.1238 > [2018-10-06 08:55:23.370507] I > [gsyncd(/bricks/brick-a1/brick):799:main_i] <top>: Closing feedback > fd, waking up the monitor > [2018-10-06 08:55:25.388721] I > [master(/bricks/brick-a1/brick):1518:register] _GMaster: Working dir > path=/var/lib/misc/glusterfsd/mastervol/ssh%3A%2F%2Fgeoaccount%4010.0.2.13%3Agluster%3A%2F%2F127.0.0.1%3Aslavevol/9517ac67e25c7491f03ba5e2506505bd > [2018-10-06 08:55:25.388978] I > [resource(/bricks/brick-a1/brick):1662:service_loop] GLUSTER: Register > time????? time=1538816125 > [2018-10-06 08:55:25.405546] I > [master(/bricks/brick-a1/brick):490:mgmt_lock] _GMaster: Got lock > Becoming ACTIVE brick=/bricks/brick-a1/brick > [2018-10-06 08:55:25.408958] I > [gsyncdstatus(/bricks/brick-a1/brick):276:set_active] GeorepStatus: > Worker Status Change status=Active > [2018-10-06 08:55:25.410522] I > [gsyncdstatus(/bricks/brick-a1/brick):248:set_worker_crawl_status] > GeorepStatus: Crawl Status Changestatus=History Crawl > [2018-10-06 08:55:25.411005] I > [master(/bricks/brick-a1/brick):1432:crawl] _GMaster: starting history > crawl???? turns=1 stime=(1538745843, 0)????? entry_stime=None??????? > etime=1538816125 > [2018-10-06 08:55:26.413892] I > [master(/bricks/brick-a1/brick):1461:crawl] _GMaster: slave's > time?????? stime=(1538745843, 0) > [2018-10-06 08:55:26.933149] I > [master(/bricks/brick-a1/brick):1863:syncjob] Syncer: Sync Time > Taken??? duration=0.0549 num_files=1job=3 return_code=3 > [2018-10-06 08:55:26.933419] E > [resource(/bricks/brick-a1/brick):210:errlog] Popen: command returned > error????? cmd=rsync -aR0 --inplace --files-from=- --super --stats > --numeric-ids --no-implied-dirs --existing --xattrs --acls > --ignore-missing-args . -e ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem -p 22 > -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-Oq_aPL/05b8d7b5dab75575689c0e1a2ec33b3f.sock > --compress geoaccount at servere:/proc/12489/cwd error=3 > [2018-10-06 08:55:26.953044] I > [syncdutils(/bricks/brick-a1/brick):271:finalize] <top>: exiting. > [2018-10-06 08:55:26.956691] I > [repce(/bricks/brick-a1/brick):92:service_loop] RepceServer: > terminating on reaching EOF. > [2018-10-06 08:55:26.957233] I > [syncdutils(/bricks/brick-a1/brick):271:finalize] <top>: exiting. > [2018-10-06 08:55:27.378103] I [monitor(monitor):363:monitor] Monitor: > worker died in startup phase???? brick=/bricks/brick-a1/brick > [2018-10-06 08:55:27.382554] I > [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker > Status Change status=Faulty > [root at servera ~]# > > > > [root at servera ~]# gluster volume info mastervol > > Volume Name: mastervol > Type: Replicate > Volume ID: b7ec0647-b101-4240-9abf-32f24f2decec > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: servera:/bricks/brick-a1/brick > Brick2: serverb:/bricks/brick-b1/brick > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > geo-replication.indexing: on > geo-replication.ignore-pid-check: on > changelog.changelog: on > cluster.enable-shared-storage: enable > > [root at servere ~]# gluster volume info slavevol > > Volume Name: slavevol > Type: Replicate > Volume ID: 8b431b4e-5dc4-4db6-9608-3b82cce5024c > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: servere:/bricks/brick-e1/brick > Brick2: servere:/bricks/brick-e2/brick > Options Reconfigured: > features.read-only: off > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > performance.quick-read: off > > Do you have any idea how can I solve this? > > Many thanks! > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181007/945fe3d0/attachment.html>