Michael Roth
2018-Oct-25 12:10 UTC
[Gluster-users] Geo Replication OSError: [Errno 107] Transport endpoint is not connected
I've a big Problem. If I start geo-replication everything seems fine, but after replicating 2.5TB I got errors, it's starting over an over again with the same errors. I've two nodes with a replicated volume and a third arbiter node. The destination node is a single node. The firewall between all nodes ist open. Master Log [2018-10-25 07:08:59.619699] D [master(/gluster/owncloud/brick2):1665:Xcrawl] _GMaster: entering ./data/fa/files/backup/research/projects/2011-Regularity/2012-03-Gain-of-Regularity-linearWFP [2018-10-25 07:08:59.619874] E [syncdutils(/gluster/owncloud/brick2):325:log_raise_exception] <top>: glusterfs session went down??????? error=ENOTCONN [2018-10-25 07:08:59.620109] E [syncdutils(/gluster/owncloud/brick2):331:log_raise_exception] <top>: FULL EXCEPTION TRACE: Traceback (most recent call last): ? File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main ??? main_i() ? File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 801, in main_i ??? local.service_loop(*[r for r in [remote] if r]) ? File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1679, in service_loop ??? g1.crawlwrap(oneshot=True, register_time=register_time) ? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in crawlwrap ??? self.crawl() ? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1555, in crawl ??? self.process([item[1]], 0) ? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1204, in process ??? self.process_change(change, done, retry) ? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1143, in process_change ??? st = lstat(go[0]) ? File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 553, in lstat ??? return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE, EBUSY]) ? File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 535, in errno_wrap ??? return call(*arg) OSError: [Errno 107] Transport endpoint is not connected: '.gfid/5c143d64-165f-44b1-98ed-71e491376a76' [2018-10-25 07:08:59.627846] D [master(/gluster/owncloud/brick2):1665:Xcrawl] _GMaster: entering ./data/fa/files/backup/research/projects/2011-Regularity/resources [2018-10-25 07:08:59.632826] D [master(/gluster/owncloud/brick2):1665:Xcrawl] _GMaster: entering ./data/fa/files/backup/research/projects/2011-Regularity/add material [2018-10-25 07:08:59.633582] D [master(/gluster/owncloud/brick2):1665:Xcrawl] _GMaster: entering ./data/fa/files/backup/research/projects/2011-Regularity/add material/Maple [2018-10-25 07:08:59.636306] D [master(/gluster/owncloud/brick2):1665:Xcrawl] _GMaster: entering ./data/fa/files/backup/research/projects/2011-Regularity/add material/notes [2018-10-25 07:08:59.637303] I [syncdutils(/gluster/owncloud/brick2):271:finalize] <top>: exiting. [2018-10-25 07:08:59.640778] I [repce(/gluster/owncloud/brick2):92:service_loop] RepceServer: terminating on reaching EOF. [2018-10-25 07:08:59.641222] I [syncdutils(/gluster/owncloud/brick2):271:finalize] <top>: exiting. [2018-10-25 07:09:00.314140] I [monitor(monitor):363:monitor] Monitor: worker died in startup phase brick=/gluster/owncloud/brick2 [2018-10-25 07:09:00.315172] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty Slave Log [2018-10-25 07:08:44.206372] I [resource(slave):1502:connect] GLUSTER: Mounting gluster volume locally... [2018-10-25 07:08:45.229620] I [resource(slave):1515:connect] GLUSTER: Mounted gluster volume?? duration=1.0229 [2018-10-25 07:08:45.230180] I [resource(slave):1012:service_loop] GLUSTER: slave listening [2018-10-25 07:08:59.641242] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. [2018-10-25 07:08:59.655611] I [syncdutils(slave):271:finalize] <top>: exiting. Volume Info Volume Name: datacloud Type: Replicate Volume ID: 6cc79599-7a5c-4b02-bd86-13020a9d91db Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 172.17.45.11:/gluster/datacloud/brick2 Brick2: 172.17.45.12:/gluster/datacloud/brick2 Brick3: 172.17.45.13:/gluster/datacloud/brick2 (arbiter) Options Reconfigured: cluster.server-quorum-type: server cluster.shd-max-threads: 32 cluster.self-heal-readdir-size: 64KB cluster.quorum-type: fixed transport.address-family: inet diagnostics.brick-log-level: INFO changelog.capture-del-path: on storage.build-pgfid: on changelog.changelog: on geo-replication.ignore-pid-check: on server.statedump-path: /tmp/gluster cluster.self-heal-window-size: 32 geo-replication.indexing: on nfs.trusted-sync: off diagnostics.dump-fd-stats: off nfs.disable: on cluster.self-heal-daemon: enable cluster.background-self-heal-count: 16 cluster.heal-timeout: 120 cluster.data-self-heal-algorithm: full cluster.consistent-metadata: on network.ping-timeout: 20 cluster.granular-entry-heal: enable cluster.server-quorum-ratio: 51% cluster.enable-shared-storage: enable Best regards, Michael -- Michael Roth | michael.roth at tuwien.ac.at IT Solutions - Application Management Technische Universit?t Wien - Operngasse 11, 1040 Wien T +43-1-58801-42091