Tom Fite
2017-May-15 19:44 UTC
[Gluster-users] Geo-replication 'faulty' status during initial sync, 'Transport endpoint is not connected'
Hi all, I've hit a strange problem with geo-replication. On gluster 3.10.1, I have set up geo replication between my replicated / distributed instance and a remote replicated / distributed instance. The master and slave instances are connected via VPN. Initially the geo-replication setup was working fine, I had a status of "Active" with "Changelog crawl" previously after the initial setup, and I confirmed that files were synced between the two gluster instances. Something must have changed between then and now, because about a week after the instance had been online it switched to a "Faulty" status. [root at master-gfs1 ~]# gluster volume geo-replication gv0 root at slave-gfs1.tomfite.com::gv0 status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- master-gfs1.tomfite.com gv0 /data/brick1/gv0 root slave-gfs1.tomfite.com::gv0 N/A Faulty N/A N/A master-gfs1.tomfite.com gv0 /data/brick2/gv0 root slave-gfs1.tomfite.com::gv0 N/A Faulty N/A N/A master-gfs1.tomfite.com gv0 /data/brick3/gv0 root slave-gfs1.tomfite.com::gv0 N/A Faulty N/A N/A master-gfs2.tomfite.com gv0 /data/brick1/gv0 root slave-gfs1.tomfite.com::gv0 slave-gfs1.tomfite.com Passive N/A N/A master-gfs2.tomfite.com gv0 /data/brick2/gv0 root slave-gfs1.tomfite.com::gv0 slave-gfs1.tomfite.com Passive N/A N/A master-gfs2.tomfite.com gv0 /data/brick3/gv0 root slave-gfs1.tomfite.com::gv0 slave-gfs1.tomfite.com Passive N/A N/A>From the logs (see below) seems like there is an issue trying to sync filesto the slave, as I get a "Transport is not connected" error when gsyncd attempts to sync the first set of files. Here's what I've tried so far: 1. ssh_port is currently configured on a non-standard port. I switched the port to the standard 22 but observed no change in behavior. 2. I verified that SELinux is disabled on all boxes, and that there are no firewalls running. 3. The remote_gsyncd setting was set to "/nonexistent/gsyncd' which looked incorrect, changed it to a valid location for that executable /usr/libexec/glusterfs/gsyncd 4. In an attempt to start the slave from scratch, I removed all files from the slave and reset the geo-replication instance by deleting and recreating the session. Debug logs when trying to start geo-replication: [2017-05-15 16:31:32.940068] I [gsyncd(conf):689:main_i] <top>: Config Set: session-owner = d37a7455-0b1b-402e-985b-cf1ace4e513e [2017-05-15 16:31:33.293926] D [monitor(monitor):434:distribute] <top>: master bricks: [{'host': 'master-gfs1.tomfite.com', 'uuid': 'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick1/gv0'}, {'host': 'master-gfs2.tomfite.com', 'uuid': 'bdbb7a18-3ecf-4733-a5df-447d8c712af5', 'dir': '/data/brick1/gv0'}, {'host': 'master-gfs1.tomfite.com', 'uuid': 'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick2/gv0'}, {'host': 'master-gfs2.tomfite.com', 'uuid': 'bdbb7a18-3ecf-4733-a5df-447d8c712af5', 'dir': '/data/brick2/gv0'}, {'host': 'master-gfs1.tomfite.com', 'uuid': 'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick3/gv0'}, {'host': 'master-gfs2.tomfite.com', 'uuid': 'bdbb7a18-3ecf-4733-a5df-447d8c712af5', 'dir': '/data/brick3/gv0'}] [2017-05-15 16:31:33.294250] D [monitor(monitor):443:distribute] <top>: slave SSH gateway: slave-gfs1.tomfite.com [2017-05-15 16:31:33.424451] D [monitor(monitor):464:distribute] <top>: slave bricks: [{'host': 'slave-gfs1.tomfite.com', 'uuid': 'c184bc78-cff0-4cef-8c6a-e637ab52b324', 'dir': '/data/brick1/gv0'}, {'host': 'slave-gfs2.tomfite.com', 'uuid': '7290f265-0709-45fc-86ef-2ff5125d31e1', 'dir': '/data/brick1/gv0'}, {'host': 'slave-gfs1.tomfite.com', 'uuid': 'c184bc78-cff0-4cef-8c6a-e637ab52b324', 'dir': '/data/brick2/gv0'}, {'host': 'slave-gfs2.tomfite.com', 'uuid': '7290f265-0709-45fc-86ef-2ff5125d31e1', 'dir': '/data/brick2/gv0'}, {'host': 'slave-gfs1.tomfite.com', 'uuid': 'c184bc78-cff0-4cef-8c6a-e637ab52b324', 'dir': '/data/brick3/gv0'}, {'host': 'slave-gfs2.tomfite.com', 'uuid': '7290f265-0709-45fc-86ef-2ff5125d31e1', 'dir': '/data/brick3/gv0'}] [2017-05-15 16:31:33.424927] D [monitor(monitor):119:is_hot] Volinfo: brickpath: 'master-gfs1.tomfite.com:/data/brick1/gv0' [2017-05-15 16:31:33.425452] D [monitor(monitor):119:is_hot] Volinfo: brickpath: 'master-gfs1.tomfite.com:/data/brick2/gv0' [2017-05-15 16:31:33.425790] D [monitor(monitor):119:is_hot] Volinfo: brickpath: 'master-gfs1.tomfite.com:/data/brick3/gv0' [2017-05-15 16:31:33.426130] D [monitor(monitor):489:distribute] <top>: worker specs: [({'host': 'master-gfs1.tomfite.com', 'uuid': 'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick1/gv0'}, 'ssh://root at slave-gfs2.tomfite.com:gluster://localhost:gv0', '1', False), ({'host': 'master-gfs1.tomfite.com', 'uuid': 'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick2/gv0'}, 'ssh://root at slave-gfs2.tomfite.com:gluster://localhost:gv0', '2', False), ({'host': 'master-gfs1.tomfite.com', 'uuid': 'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick3/gv0'}, 'ssh://root at slave-gfs2.tomfite.com:gluster://localhost:gv0', '3', False)] [2017-05-15 16:31:33.429359] I [gsyncdstatus(monitor):241:set_worker_status] GeorepStatus: Worker Status: Initializing... [2017-05-15 16:31:33.432882] I [gsyncdstatus(monitor):241:set_worker_status] GeorepStatus: Worker Status: Initializing... [2017-05-15 16:31:33.435489] I [gsyncdstatus(monitor):241:set_worker_status] GeorepStatus: Worker Status: Initializing... [2017-05-15 16:31:33.574393] I [monitor(monitor):74:get_slave_bricks_status] <top>: Unable to get list of up nodes of gv0, returning empty list: Another transaction is in progress for gv0. Please try again after sometime. [2017-05-15 16:31:33.574764] I [monitor(monitor):275:monitor] Monitor: starting gsyncd worker(/data/brick2/gv0). Slave node: ssh://root at slave-gfs2.tomfite.com:gluster://localhost:gv0 [2017-05-15 16:31:33.578641] I [monitor(monitor):74:get_slave_bricks_status] <top>: Unable to get list of up nodes of gv0, returning empty list: Another transaction is in progress for gv0. Please try again after sometime. [2017-05-15 16:31:33.579119] I [monitor(monitor):275:monitor] Monitor: starting gsyncd worker(/data/brick1/gv0). Slave node: ssh://root at slave-gfs2.tomfite.com:gluster://localhost:gv0 [2017-05-15 16:31:33.585609] I [monitor(monitor):275:monitor] Monitor: starting gsyncd worker(/data/brick3/gv0). Slave node: ssh://root at slave-gfs2.tomfite.com:gluster://localhost:gv0 [2017-05-15 16:31:33.671281] D [gsyncd(/data/brick1/gv0):765:main_i] <top>: rpc_fd: '9,12,11,10' [2017-05-15 16:31:33.672070] I [changelogagent(/data/brick1/gv0):73:__init__] ChangelogAgent: Agent listining... [2017-05-15 16:31:33.673501] D [gsyncd(/data/brick3/gv0):765:main_i] <top>: rpc_fd: '8,11,10,9' [2017-05-15 16:31:33.674078] I [changelogagent(/data/brick3/gv0):73:__init__] ChangelogAgent: Agent listining... [2017-05-15 16:31:33.676042] D [gsyncd(/data/brick2/gv0):765:main_i] <top>: rpc_fd: '9,14,13,11' [2017-05-15 16:31:33.676713] I [changelogagent(/data/brick2/gv0):73:__init__] ChangelogAgent: Agent listining... [2017-05-15 16:31:33.695128] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865893.7 __repce_version__() ... [2017-05-15 16:31:33.696594] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865893.7 __repce_version__() ... [2017-05-15 16:31:33.706545] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865893.71 __repce_version__() ... [2017-05-15 16:31:39.342730] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865893.7 __repce_version__ -> 1.0 [2017-05-15 16:31:39.343020] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865899.34 version() ... [2017-05-15 16:31:39.343569] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865893.71 __repce_version__ -> 1.0 [2017-05-15 16:31:39.343859] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865899.34 version() ... [2017-05-15 16:31:39.349275] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865893.7 __repce_version__ -> 1.0 [2017-05-15 16:31:39.349540] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865899.35 version() ... [2017-05-15 16:31:39.349998] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865899.34 version -> 1.0 [2017-05-15 16:31:39.350292] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865899.35 pid() ... [2017-05-15 16:31:39.350780] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865899.34 version -> 1.0 [2017-05-15 16:31:39.351070] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865899.35 pid() ... [2017-05-15 16:31:39.356405] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865899.35 version -> 1.0 [2017-05-15 16:31:39.356715] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865899.36 pid() ... [2017-05-15 16:31:39.357254] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865899.35 pid -> 19304 [2017-05-15 16:31:39.357983] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865899.35 pid -> 19305 [2017-05-15 16:31:39.363502] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865899.36 pid -> 19303 [2017-05-15 16:31:43.453656] D [resource(/data/brick3/gv0):1332:inhibit] DirectMounter: auxiliary glusterfs mount in place [2017-05-15 16:31:43.462914] D [resource(/data/brick1/gv0):1332:inhibit] DirectMounter: auxiliary glusterfs mount in place [2017-05-15 16:31:43.464389] D [resource(/data/brick2/gv0):1332:inhibit] DirectMounter: auxiliary glusterfs mount in place [2017-05-15 16:31:44.478801] D [resource(/data/brick3/gv0):1387:inhibit] DirectMounter: auxiliary glusterfs mount prepared [2017-05-15 16:31:44.479312] D [master(/data/brick3/gv0):101:gmaster_builder] <top>: setting up xsync change detection mode [2017-05-15 16:31:44.479366] D [monitor(monitor):350:monitor] Monitor: worker(/data/brick3/gv0) connected [2017-05-15 16:31:44.480387] D [master(/data/brick3/gv0):101:gmaster_builder] <top>: setting up changelog change detection mode [2017-05-15 16:31:44.481631] D [master(/data/brick3/gv0):101:gmaster_builder] <top>: setting up changeloghistory change detection mode [2017-05-15 16:31:44.485300] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865904.49 version() ... [2017-05-15 16:31:44.485999] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865904.49 version -> 1.0 [2017-05-15 16:31:44.486202] D [master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650 [2017-05-15 16:31:44.486382] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865904.49 init() ... [2017-05-15 16:31:44.487781] D [resource(/data/brick1/gv0):1387:inhibit] DirectMounter: auxiliary glusterfs mount prepared [2017-05-15 16:31:44.488292] D [master(/data/brick1/gv0):101:gmaster_builder] <top>: setting up xsync change detection mode [2017-05-15 16:31:44.488245] D [monitor(monitor):350:monitor] Monitor: worker(/data/brick1/gv0) connected [2017-05-15 16:31:44.489343] D [master(/data/brick1/gv0):101:gmaster_builder] <top>: setting up changelog change detection mode [2017-05-15 16:31:44.489279] D [resource(/data/brick2/gv0):1387:inhibit] DirectMounter: auxiliary glusterfs mount prepared [2017-05-15 16:31:44.489826] D [master(/data/brick2/gv0):101:gmaster_builder] <top>: setting up xsync change detection mode [2017-05-15 16:31:44.489825] D [monitor(monitor):350:monitor] Monitor: worker(/data/brick2/gv0) connected [2017-05-15 16:31:44.490509] D [master(/data/brick1/gv0):101:gmaster_builder] <top>: setting up changeloghistory change detection mode [2017-05-15 16:31:44.491131] D [master(/data/brick2/gv0):101:gmaster_builder] <top>: setting up changelog change detection mode [2017-05-15 16:31:44.493197] D [master(/data/brick2/gv0):101:gmaster_builder] <top>: setting up changeloghistory change detection mode [2017-05-15 16:31:44.493820] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865904.49 version() ... [2017-05-15 16:31:44.494577] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865904.49 version -> 1.0 [2017-05-15 16:31:44.494801] D [master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd [2017-05-15 16:31:44.494982] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865904.49 init() ... [2017-05-15 16:31:44.495695] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865904.5 version() ... [2017-05-15 16:31:44.496423] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865904.5 version -> 1.0 [2017-05-15 16:31:44.496617] D [master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8 [2017-05-15 16:31:44.496607] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865904.49 init -> None [2017-05-15 16:31:44.496813] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865904.5 init() ... [2017-05-15 16:31:44.496891] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865904.5 register('/data/brick3/gv0', '/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650', '/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.%2Fdata%2Fbrick3%2Fgv0-changes.log', 7, 5) ... [2017-05-15 16:31:44.505940] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865904.49 init -> None [2017-05-15 16:31:44.506314] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865904.51 register('/data/brick1/gv0', '/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd', '/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.%2Fdata%2Fbrick1%2Fgv0-changes.log', 7, 5) ... [2017-05-15 16:31:44.507751] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865904.5 init -> None [2017-05-15 16:31:44.508045] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865904.51 register('/data/brick2/gv0', '/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8', '/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.%2Fdata%2Fbrick2%2Fgv0-changes.log', 7, 5) ... [2017-05-15 16:31:46.605554] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865904.5 register -> None [2017-05-15 16:31:46.605916] D [master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650 [2017-05-15 16:31:46.606117] D [master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650 [2017-05-15 16:31:46.606285] D [master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650 [2017-05-15 16:31:46.606420] I [master(/data/brick3/gv0):1328:register] _GMaster: Working dir: /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650 [2017-05-15 16:31:46.606653] I [resource(/data/brick3/gv0):1604:service_loop] GLUSTER: Register time: 1494865906 [2017-05-15 16:31:46.607355] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140597264365312:1494865906.61 keep_alive(None,) ... [2017-05-15 16:31:46.610795] D [master(/data/brick3/gv0):540:crawlwrap] _GMaster: primary master with volume id d37a7455-0b1b-402e-985b-cf1ace4e513e ... [2017-05-15 16:31:46.615416] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140597264365312:1494865906.61 keep_alive -> 1 [2017-05-15 16:31:46.622519] I [gsyncdstatus(/data/brick3/gv0):272:set_active] GeorepStatus: Worker Status: Active [2017-05-15 16:31:46.623460] I [gsyncdstatus(/data/brick3/gv0):245:set_worker_crawl_status] GeorepStatus: Crawl Status: History Crawl [2017-05-15 16:31:46.623876] I [master(/data/brick3/gv0):1244:crawl] _GMaster: starting history crawl... turns: 1, stime: (1492459926, 0), etime: 1494865906, entry_stime: (1492459926, 0) [2017-05-15 16:31:46.624118] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865906.62 history('/data/brick3/gv0/.glusterfs/changelogs', 1492459926, 1494865906, 3) ... [2017-05-15 16:31:46.639169] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865906.62 history -> (0, 1494865893L) [2017-05-15 16:31:46.639429] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865906.64 history_scan() ... [2017-05-15 16:31:46.671082] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865904.51 register -> None [2017-05-15 16:31:46.671462] D [master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd [2017-05-15 16:31:46.671639] D [master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd [2017-05-15 16:31:46.671840] D [master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd [2017-05-15 16:31:46.671979] I [master(/data/brick1/gv0):1328:register] _GMaster: Working dir: /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd [2017-05-15 16:31:46.671940] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865904.51 register -> None [2017-05-15 16:31:46.672233] I [resource(/data/brick1/gv0):1604:service_loop] GLUSTER: Register time: 1494865906 [2017-05-15 16:31:46.672239] D [master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8 [2017-05-15 16:31:46.672440] D [master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8 [2017-05-15 16:31:46.672616] D [master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8 [2017-05-15 16:31:46.672787] I [master(/data/brick2/gv0):1328:register] _GMaster: Working dir: /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8 [2017-05-15 16:31:46.673033] I [resource(/data/brick2/gv0):1604:service_loop] GLUSTER: Register time: 1494865906 [2017-05-15 16:31:46.673294] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139688752957184:1494865906.67 keep_alive(None,) ... [2017-05-15 16:31:46.674438] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140395904235264:1494865906.67 keep_alive(None,) ... [2017-05-15 16:31:46.675556] D [master(/data/brick1/gv0):540:crawlwrap] _GMaster: primary master with volume id d37a7455-0b1b-402e-985b-cf1ace4e513e ... [2017-05-15 16:31:46.677221] D [master(/data/brick2/gv0):540:crawlwrap] _GMaster: primary master with volume id d37a7455-0b1b-402e-985b-cf1ace4e513e ... [2017-05-15 16:31:46.680387] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139688752957184:1494865906.67 keep_alive -> 1 [2017-05-15 16:31:46.681812] I [gsyncdstatus(/data/brick1/gv0):272:set_active] GeorepStatus: Worker Status: Active [2017-05-15 16:31:46.682248] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140395904235264:1494865906.67 keep_alive -> 1 [2017-05-15 16:31:46.682954] I [gsyncdstatus(/data/brick1/gv0):245:set_worker_crawl_status] GeorepStatus: Crawl Status: History Crawl [2017-05-15 16:31:46.683324] I [master(/data/brick1/gv0):1244:crawl] _GMaster: starting history crawl... turns: 1, stime: (1492459922, 0), etime: 1494865906, entry_stime: (1492459922, 0) [2017-05-15 16:31:46.683530] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865906.68 history('/data/brick1/gv0/.glusterfs/changelogs', 1492459922, 1494865906, 3) ... [2017-05-15 16:31:46.683958] I [gsyncdstatus(/data/brick2/gv0):272:set_active] GeorepStatus: Worker Status: Active [2017-05-15 16:31:46.684827] I [gsyncdstatus(/data/brick2/gv0):245:set_worker_crawl_status] GeorepStatus: Crawl Status: History Crawl [2017-05-15 16:31:46.685203] I [master(/data/brick2/gv0):1244:crawl] _GMaster: starting history crawl... turns: 1, stime: (1492459925, 0), etime: 1494865906, entry_stime: (1492459925, 0) [2017-05-15 16:31:46.685420] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865906.69 history('/data/brick2/gv0/.glusterfs/changelogs', 1492459925, 1494865906, 3) ... [2017-05-15 16:31:46.702970] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865906.68 history -> (0, 1494865893L) [2017-05-15 16:31:46.703003] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865906.69 history -> (0, 1494865897L) [2017-05-15 16:31:46.703197] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865906.7 history_scan() ... [2017-05-15 16:31:46.703249] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865906.7 history_scan() ... [2017-05-15 16:31:46.703787] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865906.7 history_scan -> 1 [2017-05-15 16:31:46.703988] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865906.7 history_getchanges() ... [2017-05-15 16:31:46.704641] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865906.7 history_getchanges -> ['/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd/.history/.processing/CHANGELOG.1492459923'] [2017-05-15 16:31:46.704828] I [master(/data/brick1/gv0):1272:crawl] _GMaster: slave's time: (1492459922, 0) [2017-05-15 16:31:46.704973] D [master(/data/brick1/gv0):1183:changelogs_batch_process] _GMaster: processing changes ['/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd/.history/.processing/CHANGELOG.1492459923'] [2017-05-15 16:31:46.705100] D [master(/data/brick1/gv0):1038:process] _GMaster: processing change /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd/.history/.processing/CHANGELOG.1492459923 [2017-05-15 16:31:46.706136] D [master(/data/brick1/gv0):948:process_change] _GMaster: entries: [{'uid': 10000006, 'gfid': 'bf3b90bd-34a5-4265-98a6-54e7a783c142', 'gid': 25961, 'mode': 33200, 'entry': '.gfid/598cc6d2-b95e-4ba2-9a70-d1a9c0f752ce/file-946-of-5000-at-1.00KB', 'op': 'CREATE'}, ... ... /* omitted many file paths to sync */ ... [2017-05-15 16:31:46.737530] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865906.71 entry_ops -> [] [2017-05-15 16:31:46.741244] E [syncdutils(/data/brick1/gv0):291:log_raise_exception] <top>: glusterfs session went down [ENOTCONN] [2017-05-15 16:31:46.741379] E [syncdutils(/data/brick1/gv0):297:log_raise_exception] <top>: FULL EXCEPTION TRACE: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 204, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 780, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1610, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 600, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1281, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1184, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1039, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 986, in process_change st = lstat(go[0]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 490, in lstat return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 473, in errno_wrap return call(*arg) OSError: [Errno 107] Transport endpoint is not connected: '.gfid/accf7915-d1dc-4869-86d9-60722ccdf9c4' Current geo-replication config: special_sync_mode: partial gluster_log_file: /var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.gluster.log ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem ssh_port: 20022 change_detector: changelog session_owner: d37a7455-0b1b-402e-985b-cf1ace4e513e state_file: /var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/monitor.status gluster_params: aux-gfid-mount acl log_level: DEBUG remote_gsyncd: /usr/libexec/glusterfs/gsyncd working_dir: /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0 state_detail_file: /var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0-detail.status gluster_command_dir: /usr/sbin/ pid_file: /var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/monitor.pid georep_session_working_dir: /var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/ ssh_command_tar: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem master.stime_xattr_name: trusted.glusterfs.d37a7455-0b1b-402e-985b-cf1ace4e513e.30970990-6acb-4f33-a1f2-5c2056004818.stime changelog_log_file: /var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0-changes.log socketdir: /var/run/gluster volume_id: d37a7455-0b1b-402e-985b-cf1ace4e513e ignore_deletes: false state_socket_unencoded: /var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.socket log_file: /var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.log Gluster volume status on master Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick master-gfs1.tomfite.com:/data/ brick1/gv0 49152 0 Y 3989 Brick master-gfs2.tomfite.com:/data/ brick1/gv0 49152 0 Y 3610 Brick master-gfs1.tomfite.com:/data/ brick2/gv0 49153 0 Y 4000 Brick master-gfs2.tomfite.com:/data/ brick2/gv0 49153 0 Y 3621 Brick master-gfs1.tomfite.com:/data/ brick3/gv0 49154 0 Y 4010 Brick master-gfs2.tomfite.com:/data/ brick3/gv0 49154 0 Y 3632 Snapshot Daemon on localhost 49197 0 Y 4946 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 2885 Snapshot Daemon on master-gfs2.tomfite .com 49197 0 Y 4600 NFS Server on master-gfs2.tomfite.co m N/A N/A N N/A Self-heal Daemon on master-gfs2.tomfite .com N/A N/A Y 2856 Task Status of Volume gv0 ------------------------------------------------------------------------------ There are no active volume tasks Gluster volume status on slave Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick slave-gfs1.tomfite.com:/data/b rick1/gv0 49152 0 Y 3688 Brick slave-gfs2.tomfite.com:/data/b rick1/gv0 49152 0 Y 3701 Brick slave-gfs1.tomfite.com:/data/b rick2/gv0 49153 0 Y 3696 Brick slave-gfs2.tomfite.com:/data/b rick2/gv0 49153 0 Y 3695 Brick slave-gfs1.tomfite.com:/data/b rick3/gv0 49154 0 Y 3702 Brick slave-gfs2.tomfite.com:/data/b rick3/gv0 49154 0 Y 3707 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 2630 NFS Server on slave-gfs2.tomfite.com N/A N/A N N/A Self-heal Daemon on slave-gfs2.tomfite. com N/A N/A Y 2635 Task Status of Volume gv0 ------------------------------------------------------------------------------ There are no active volume tasks Anybody have any other ideas for me to check out? -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.gluster.org/pipermail/gluster-users/attachments/20170515/a8e46c36/attachment.html>