Matthew Benstead
2020-Oct-05 19:11 UTC
[Gluster-users] Gluster7 GeoReplication Operation not permitted and incomplete sync
Thanks Felix - looking through some more of the logs I may have found the reason... From /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/mnt-data-storage_a-storage.log [2020-10-05 18:13:35.736838] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2020-10-05 18:18:53.885591] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2020-10-05 18:22:14.405234] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2020-10-05 18:25:53.971679] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2020-10-05 18:31:44.571557] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2020-10-05 18:36:36.508772] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2020-10-05 18:40:10.401055] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2020-10-05 18:42:57.833536] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2020-10-05 18:45:19.691953] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2020-10-05 18:48:26.478532] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2020-10-05 18:52:24.466914] E [fuse-bridge.c:4288:fuse_xattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage The slave nodes are running gluster on top of ZFS, but I had configured ACLs - is there something else missing to make this work with ZFS? [root at pcic-backup01 ~]# gluster volume info ? Volume Name: pcic-backup Type: Distribute Volume ID: 7af8a424-f4b6-4405-bba1-0dbafb0fa231 Status: Started Snapshot Count: 0 Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.0.231.81:/pcic-backup01-zpool/brick Brick2: 10.0.231.82:/pcic-backup02-zpool/brick Options Reconfigured: network.ping-timeout: 10 performance.cache-size: 256MB server.event-threads: 4 client.event-threads: 4 cluster.lookup-optimize: on performance.parallel-readdir: on performance.readdir-ahead: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on transport.address-family: inet nfs.disable: on features.read-only: off performance.open-behind: off [root at pcic-backup01 ~]# zfs get acltype pcic-backup01-zpool NAME???????????????? PROPERTY? VALUE???? SOURCE pcic-backup01-zpool? acltype?? posixacl? local [root at pcic-backup01 ~]# grep "pcic-backup0" /proc/mounts pcic-backup01-zpool /pcic-backup01-zpool zfs rw,seclabel,xattr,posixacl 0 0 [root at pcic-backup02 ~]# zfs get acltype pcic-backup02-zpool NAME???????????????? PROPERTY? VALUE???? SOURCE pcic-backup02-zpool? acltype?? posixacl? local [root at pcic-backup02 ~]# grep "pcic-backup0" /proc/mounts pcic-backup02-zpool /pcic-backup02-zpool zfs rw,seclabel,xattr,posixacl 0 0 Thanks, ?-Matthew -- Matthew Benstead System Administrator Pacific Climate Impacts Consortium <https://pacificclimate.org/> University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: +1-250-721-8432 Email: matthewb at uvic.ca On 10/5/20 1:39 AM, Felix K?lzow wrote:> Dear Matthew, > > > can you provide more information regarding to the geo-replication brick > logs. > > These files area also located in: > > /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/ > > > Usually, these log files are more precise to figure out the root cause > of the error. > > Additionally, it is also worth to look at the log-files on the slave > side. > > > Regards, > > Felix > > > On 01/10/2020 23:08, Matthew Benstead wrote: >> Hello, >> >> I'm looking for some help with a GeoReplication Error in my Gluster >> 7/CentOS 7 setup. Replication progress has basically stopped, and the >> status of the replication keeps switching. >> >> The gsyncd log has errors like "Operation not permitted", "incomplete >> sync", etc... help? I'm not sure how to proceed in troubleshooting this. >> >> The log is here, it basically just repeats - from: >> /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log >> >> >> [2020-10-01 20:52:15.291923] I [master(worker >> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken >> duration=32.8466??????? num_files=1749? job=3?? return_code=23 >> [2020-10-01 20:52:18.700062] I [master(worker >> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken >> duration=43.1210??????? num_files=3167? job=6?? return_code=23 >> [2020-10-01 20:52:23.383234] W [master(worker >> /data/storage_c/storage):1393:process] _GMaster: incomplete sync, >> retrying changelogs???? files=['XSYNC-CHANGELOG.1601585397'] >> [2020-10-01 20:52:28.537657] E [repce(worker >> /data/storage_b/storage):213:__call__] RepceClient: call failed >> call=258187:140538843596608:1601585515.63?????? method=entry_ops >> error=OSError >> [2020-10-01 20:52:28.538064] E [syncdutils(worker >> /data/storage_b/storage):339:log_raise_exception] <top>: FAIL: >> Traceback (most recent call last): >> ?? File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 332, >> in main >> ???? func(args) >> ?? File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86, >> in subcmd_worker >> ???? local.service_loop(remote) >> ?? File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line >> 1308, in service_loop >> ???? g1.crawlwrap(oneshot=True, register_time=register_time) >> ?? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 602, >> in crawlwrap >> ???? self.crawl() >> ?? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1682, >> in crawl >> ???? self.process([item[1]], 0) >> ?? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1327, >> in process >> ???? self.process_change(change, done, retry) >> ?? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1221, >> in process_change >> ???? failures = self.slave.server.entry_ops(entries) >> ?? File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >> 232, in >> __call__ >> ???? return self.ins(self.meth, *a) >> ?? File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >> 214, in >> __call__ >> ???? raise res >> OSError: [Errno 1] Operation not permitted >> [2020-10-01 20:52:28.570316] I [repce(agent >> /data/storage_b/storage):96:service_loop] RepceServer: terminating on >> reaching EOF. >> [2020-10-01 20:52:28.613603] I >> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker >> Status Change status=Faulty >> [2020-10-01 20:52:29.619797] I [master(worker >> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken >> duration=5.6458 num_files=455?? job=3?? return_code=23 >> [2020-10-01 20:52:38.286245] I [master(worker >> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken >> duration=14.1824??????? num_files=1333? job=2?? return_code=23 >> [2020-10-01 20:52:38.628156] I >> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker >> Status Change status=Initializing... >> [2020-10-01 20:52:38.628325] I [monitor(monitor):159:monitor] Monitor: >> starting gsyncd worker?? brick=/data/storage_b/storage >> slave_node=10.0.231.82 >> [2020-10-01 20:52:38.684736] I [gsyncd(agent >> /data/storage_b/storage):318:main] <top>: Using session config >> file >> path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf >> >> [2020-10-01 20:52:38.687213] I [gsyncd(worker >> /data/storage_b/storage):318:main] <top>: Using session config >> file >> path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf >> >> [2020-10-01 20:52:38.687401] I [changelogagent(agent >> /data/storage_b/storage):72:__init__] ChangelogAgent: Agent listining... >> [2020-10-01 20:52:38.703295] I [resource(worker >> /data/storage_b/storage):1386:connect_remote] SSH: Initializing SSH >> connection between master and slave... >> [2020-10-01 20:52:40.388372] I [resource(worker >> /data/storage_b/storage):1435:connect_remote] SSH: SSH connection >> between master and slave established. duration=1.6849 >> [2020-10-01 20:52:40.388582] I [resource(worker >> /data/storage_b/storage):1105:connect] GLUSTER: Mounting gluster volume >> locally... >> [2020-10-01 20:52:41.501105] I [resource(worker >> /data/storage_b/storage):1128:connect] GLUSTER: Mounted gluster volume >> duration=1.1123 >> [2020-10-01 20:52:41.501405] I [subcmds(worker >> /data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn >> successful. Acknowledging back to monitor >> [2020-10-01 20:52:43.531146] I [master(worker >> /data/storage_b/storage):1640:register] _GMaster: Working dir >> path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage >> >> [2020-10-01 20:52:43.533953] I [resource(worker >> /data/storage_b/storage):1291:service_loop] GLUSTER: Register time >> time=1601585563 >> [2020-10-01 20:52:43.547092] I [gsyncdstatus(worker >> /data/storage_b/storage):281:set_active] GeorepStatus: Worker Status >> Change status=Active >> [2020-10-01 20:52:43.561920] I [gsyncdstatus(worker >> /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus: >> Crawl Status Change???? status=History Crawl >> [2020-10-01 20:52:43.562184] I [master(worker >> /data/storage_b/storage):1554:crawl] _GMaster: starting history >> crawl???? turns=1 stime=None????? entry_stime=None??????? >> etime=1601585563 >> [2020-10-01 20:52:43.562269] I [resource(worker >> /data/storage_b/storage):1307:service_loop] GLUSTER: No stime available, >> using xsync crawl >> [2020-10-01 20:52:43.569799] I [master(worker >> /data/storage_b/storage):1670:crawl] _GMaster: starting hybrid >> crawl????? stime=None >> [2020-10-01 20:52:43.573528] I [gsyncdstatus(worker >> /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus: >> Crawl Status Change???? status=Hybrid Crawl >> [2020-10-01 20:52:44.370985] I [master(worker >> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken >> duration=20.4307??????? num_files=2609? job=5?? return_code=23 >> [2020-10-01 20:52:49.431854] W [master(worker >> /data/storage_c/storage):1393:process] _GMaster: incomplete sync, >> retrying changelogs???? files=['XSYNC-CHANGELOG.1601585397'] >> [2020-10-01 20:52:54.801500] I [master(worker >> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken >> duration=72.7492??????? num_files=4227? job=2?? return_code=23 >> [2020-10-01 20:52:56.766547] I [master(worker >> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken >> duration=74.3569??????? num_files=4674? job=5?? return_code=23 >> [2020-10-01 20:53:18.853333] I [master(worker >> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken >> duration=28.7125??????? num_files=4397? job=3?? return_code=23 >> [2020-10-01 20:53:21.224921] W [master(worker >> /data/storage_a/storage):1393:process] _GMaster: incomplete sync, >> retrying changelogs???? files=['CHANGELOG.1601044033', >> 'CHANGELOG.1601044048', 'CHANGELOG.1601044063', 'CHANGELOG.1601044078', >> 'CHANGELOG.1601044093', 'CHANGELOG.1601044108', 'CHANGELOG.1601044123'] >> [2020-10-01 20:53:22.134536] I [master(worker >> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken >> duration=0.2159 num_files=3???? job=3?? return_code=23 >> [2020-10-01 20:53:25.615712] I [master(worker >> /data/storage_b/storage):1681:crawl] _GMaster: processing xsync >> changelog >> path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage/xsync/XSYNC-CHANGELOG.1601585563 >> >> [2020-10-01 20:53:25.634970] W [master(worker >> /data/storage_c/storage):1393:process] _GMaster: incomplete sync, >> retrying changelogs???? files=['XSYNC-CHANGELOG.1601585397'] >> >> GeoReplication status - see it change from Active to Faulty: >> >> [root at storage01 ~]# gluster volume geo-replication status >> >> MASTER NODE??? MASTER VOL??? MASTER BRICK?????????????? SLAVE USER >> SLAVE??????????????????????????????????????? SLAVE NODE???? STATUS >> CRAWL STATUS?????? LAST_SYNCED >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> 10.0.231.91??? storage?????? /data/storage_a/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.81??? Active >> Changelog Crawl??? 2020-09-25 07:26:57 >> 10.0.231.91??? storage?????? /data/storage_c/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.82??? Active >> Hybrid Crawl?????? N/A >> 10.0.231.91??? storage?????? /data/storage_b/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.82??? Active >> Hybrid Crawl?????? N/A >> 10.0.231.92??? storage?????? /data/storage_b/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.82??? Active >> History Crawl????? 2020-09-23 01:56:05 >> 10.0.231.92??? storage?????? /data/storage_a/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.82??? Active >> Hybrid Crawl?????? N/A >> 10.0.231.92??? storage?????? /data/storage_c/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.81??? Active >> Hybrid Crawl?????? N/A >> 10.0.231.93??? storage?????? /data/storage_c/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.81??? Active >> Changelog Crawl??? 2020-09-25 06:55:57 >> 10.0.231.93??? storage?????? /data/storage_b/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.81??? Active >> Hybrid Crawl?????? N/A >> 10.0.231.93??? storage?????? /data/storage_a/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.81??? Active >> Hybrid Crawl?????? N/A >> >> [root at storage01 ~]# gluster volume geo-replication status >> >> MASTER NODE??? MASTER VOL??? MASTER BRICK?????????????? SLAVE USER >> SLAVE??????????????????????????????????????? SLAVE NODE???? STATUS >> CRAWL STATUS?????? LAST_SYNCED >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> 10.0.231.91??? storage?????? /data/storage_a/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.81??? Active >> Changelog Crawl??? 2020-09-25 07:26:57 >> 10.0.231.91??? storage?????? /data/storage_c/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.82??? Active >> Hybrid Crawl?????? N/A >> 10.0.231.91??? storage?????? /data/storage_b/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????? Faulty >> N/A??????????????? N/A >> 10.0.231.92??? storage?????? /data/storage_b/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.82??? Active >> History Crawl????? 2020-09-23 01:58:05 >> 10.0.231.92??? storage?????? /data/storage_a/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.82??? Active >> Hybrid Crawl?????? N/A >> 10.0.231.92??? storage?????? /data/storage_c/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????? Faulty >> N/A??????????????? N/A >> 10.0.231.93??? storage?????? /data/storage_c/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.81??? Active >> Changelog Crawl??? 2020-09-25 06:58:56 >> 10.0.231.93??? storage?????? /data/storage_b/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? 10.0.231.81??? Active >> Hybrid Crawl?????? N/A >> 10.0.231.93??? storage?????? /data/storage_a/storage??? geoaccount >> ssh://geoaccount at 10.0.231.81::pcic-backup??? N/A??????????? Faulty >> N/A??????????????? N/A >> >> >> Cluster information: (Note - disabled performance.open-behind to work >> around https://github.com/gluster/glusterfs/issues/1440 ) >> >> [root at storage01 ~]# gluster --version | head -1; cat >> /etc/centos-release; uname -r >> glusterfs 7.7 >> CentOS Linux release 7.8.2003 (Core) >> 3.10.0-1127.10.1.el7.x86_64 >> >> [root at storage01 ~]# df -h /storage2/ >> Filesystem??????????? Size? Used Avail Use% Mounted on >> 10.0.231.91:/storage? 328T? 228T? 100T? 70% /storage2 >> >> [root at storage01 ~]# gluster volume info >> >> Volume Name: storage >> Type: Distributed-Replicate >> Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 3 x (2 + 1) = 9 >> Transport-type: tcp >> Bricks: >> Brick1: 10.0.231.91:/data/storage_a/storage >> Brick2: 10.0.231.92:/data/storage_b/storage >> Brick3: 10.0.231.93:/data/storage_c/storage (arbiter) >> Brick4: 10.0.231.92:/data/storage_a/storage >> Brick5: 10.0.231.93:/data/storage_b/storage >> Brick6: 10.0.231.91:/data/storage_c/storage (arbiter) >> Brick7: 10.0.231.93:/data/storage_a/storage >> Brick8: 10.0.231.91:/data/storage_b/storage >> Brick9: 10.0.231.92:/data/storage_c/storage (arbiter) >> Options Reconfigured: >> changelog.changelog: on >> geo-replication.ignore-pid-check: on >> geo-replication.indexing: on >> network.ping-timeout: 10 >> features.inode-quota: on >> features.quota: on >> nfs.disable: on >> features.quota-deem-statfs: on >> storage.fips-mode-rchecksum: on >> performance.readdir-ahead: on >> performance.parallel-readdir: on >> cluster.lookup-optimize: on >> client.event-threads: 4 >> server.event-threads: 4 >> performance.cache-size: 256MB >> performance.open-behind: off >> >> Thanks, >> ??-Matthew >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201005/0894a443/attachment.html>
Felix Kölzow
2020-Oct-05 19:53 UTC
[Gluster-users] Gluster7 GeoReplication Operation not permitted and incomplete sync
Dear Matthew, this is our configuration: zfs get all mypool mypool? xattr sa????????????????????????????? local mypool? acltype posixacl??????????????????????? local Something more to consider? Regards, Felix On 05/10/2020 21:11, Matthew Benstead wrote:> Thanks Felix - looking through some more of the logs I may have found > the reason... > > From > /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/mnt-data-storage_a-storage.log > > [2020-10-05 18:13:35.736838] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > [2020-10-05 18:18:53.885591] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > [2020-10-05 18:22:14.405234] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > [2020-10-05 18:25:53.971679] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > [2020-10-05 18:31:44.571557] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > [2020-10-05 18:36:36.508772] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > [2020-10-05 18:40:10.401055] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > [2020-10-05 18:42:57.833536] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > [2020-10-05 18:45:19.691953] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > [2020-10-05 18:48:26.478532] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > [2020-10-05 18:52:24.466914] E [fuse-bridge.c:4288:fuse_xattr_cbk] > 0-glusterfs-fuse: extended attribute not supported by the backend storage > > > The slave nodes are running gluster on top of ZFS, but I had > configured ACLs - is there something else missing to make this work > with ZFS? > > [root at pcic-backup01 ~]# gluster volume info > > Volume Name: pcic-backup > Type: Distribute > Volume ID: 7af8a424-f4b6-4405-bba1-0dbafb0fa231 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: 10.0.231.81:/pcic-backup01-zpool/brick > Brick2: 10.0.231.82:/pcic-backup02-zpool/brick > Options Reconfigured: > network.ping-timeout: 10 > performance.cache-size: 256MB > server.event-threads: 4 > client.event-threads: 4 > cluster.lookup-optimize: on > performance.parallel-readdir: on > performance.readdir-ahead: on > features.quota-deem-statfs: on > features.inode-quota: on > features.quota: on > transport.address-family: inet > nfs.disable: on > features.read-only: off > performance.open-behind: off > > > [root at pcic-backup01 ~]# zfs get acltype pcic-backup01-zpool > NAME???????????????? PROPERTY? VALUE???? SOURCE > pcic-backup01-zpool? acltype?? posixacl? local > > [root at pcic-backup01 ~]# grep "pcic-backup0" /proc/mounts > pcic-backup01-zpool /pcic-backup01-zpool zfs > rw,seclabel,xattr,posixacl 0 0 > > > [root at pcic-backup02 ~]# zfs get acltype pcic-backup02-zpool > NAME???????????????? PROPERTY? VALUE???? SOURCE > pcic-backup02-zpool? acltype?? posixacl? local > > [root at pcic-backup02 ~]# grep "pcic-backup0" /proc/mounts > pcic-backup02-zpool /pcic-backup02-zpool zfs > rw,seclabel,xattr,posixacl 0 0 > > Thanks, > ?-Matthew > > > -- > Matthew Benstead > System Administrator > Pacific Climate Impacts Consortium <https://pacificclimate.org/> > University of Victoria, UH1 > PO Box 1800, STN CSC > Victoria, BC, V8W 2Y2 > Phone: +1-250-721-8432 > Email: matthewb at uvic.ca > > On 10/5/20 1:39 AM, Felix K?lzow wrote: >> Dear Matthew, >> >> >> can you provide more information regarding to the geo-replication brick >> logs. >> >> These files area also located in: >> >> /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/ >> >> >> Usually, these log files are more precise to figure out the root cause >> of the error. >> >> Additionally, it is also worth to look at the log-files on the slave >> side. >> >> >> Regards, >> >> Felix >> >> >> On 01/10/2020 23:08, Matthew Benstead wrote: >>> Hello, >>> >>> I'm looking for some help with a GeoReplication Error in my Gluster >>> 7/CentOS 7 setup. Replication progress has basically stopped, and the >>> status of the replication keeps switching. >>> >>> The gsyncd log has errors like "Operation not permitted", "incomplete >>> sync", etc... help? I'm not sure how to proceed in troubleshooting >>> this. >>> >>> The log is here, it basically just repeats - from: >>> /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log >>> >>> >>> [2020-10-01 20:52:15.291923] I [master(worker >>> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken >>> duration=32.8466??????? num_files=1749? job=3?? return_code=23 >>> [2020-10-01 20:52:18.700062] I [master(worker >>> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken >>> duration=43.1210??????? num_files=3167? job=6?? return_code=23 >>> [2020-10-01 20:52:23.383234] W [master(worker >>> /data/storage_c/storage):1393:process] _GMaster: incomplete sync, >>> retrying changelogs???? files=['XSYNC-CHANGELOG.1601585397'] >>> [2020-10-01 20:52:28.537657] E [repce(worker >>> /data/storage_b/storage):213:__call__] RepceClient: call failed >>> call=258187:140538843596608:1601585515.63 method=entry_ops >>> error=OSError >>> [2020-10-01 20:52:28.538064] E [syncdutils(worker >>> /data/storage_b/storage):339:log_raise_exception] <top>: FAIL: >>> Traceback (most recent call last): >>> ?? File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 332, >>> in main >>> ???? func(args) >>> ?? File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86, >>> in subcmd_worker >>> ???? local.service_loop(remote) >>> ?? File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line >>> 1308, in service_loop >>> ???? g1.crawlwrap(oneshot=True, register_time=register_time) >>> ?? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 602, >>> in crawlwrap >>> ???? self.crawl() >>> ?? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line >>> 1682, >>> in crawl >>> ???? self.process([item[1]], 0) >>> ?? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line >>> 1327, >>> in process >>> ???? self.process_change(change, done, retry) >>> ?? File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line >>> 1221, >>> in process_change >>> ???? failures = self.slave.server.entry_ops(entries) >>> ?? File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >>> 232, in >>> __call__ >>> ???? return self.ins(self.meth, *a) >>> ?? File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >>> 214, in >>> __call__ >>> ???? raise res >>> OSError: [Errno 1] Operation not permitted >>> [2020-10-01 20:52:28.570316] I [repce(agent >>> /data/storage_b/storage):96:service_loop] RepceServer: terminating on >>> reaching EOF. >>> [2020-10-01 20:52:28.613603] I >>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker >>> Status Change status=Faulty >>> [2020-10-01 20:52:29.619797] I [master(worker >>> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken >>> duration=5.6458 num_files=455?? job=3?? return_code=23 >>> [2020-10-01 20:52:38.286245] I [master(worker >>> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken >>> duration=14.1824??????? num_files=1333? job=2?? return_code=23 >>> [2020-10-01 20:52:38.628156] I >>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker >>> Status Change status=Initializing... >>> [2020-10-01 20:52:38.628325] I [monitor(monitor):159:monitor] Monitor: >>> starting gsyncd worker?? brick=/data/storage_b/storage >>> slave_node=10.0.231.82 >>> [2020-10-01 20:52:38.684736] I [gsyncd(agent >>> /data/storage_b/storage):318:main] <top>: Using session config >>> file >>> path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf >>> >>> [2020-10-01 20:52:38.687213] I [gsyncd(worker >>> /data/storage_b/storage):318:main] <top>: Using session config >>> file >>> path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf >>> >>> [2020-10-01 20:52:38.687401] I [changelogagent(agent >>> /data/storage_b/storage):72:__init__] ChangelogAgent: Agent >>> listining... >>> [2020-10-01 20:52:38.703295] I [resource(worker >>> /data/storage_b/storage):1386:connect_remote] SSH: Initializing SSH >>> connection between master and slave... >>> [2020-10-01 20:52:40.388372] I [resource(worker >>> /data/storage_b/storage):1435:connect_remote] SSH: SSH connection >>> between master and slave established. duration=1.6849 >>> [2020-10-01 20:52:40.388582] I [resource(worker >>> /data/storage_b/storage):1105:connect] GLUSTER: Mounting gluster volume >>> locally... >>> [2020-10-01 20:52:41.501105] I [resource(worker >>> /data/storage_b/storage):1128:connect] GLUSTER: Mounted gluster volume >>> duration=1.1123 >>> [2020-10-01 20:52:41.501405] I [subcmds(worker >>> /data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn >>> successful. Acknowledging back to monitor >>> [2020-10-01 20:52:43.531146] I [master(worker >>> /data/storage_b/storage):1640:register] _GMaster: Working dir >>> path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage >>> >>> [2020-10-01 20:52:43.533953] I [resource(worker >>> /data/storage_b/storage):1291:service_loop] GLUSTER: Register time >>> time=1601585563 >>> [2020-10-01 20:52:43.547092] I [gsyncdstatus(worker >>> /data/storage_b/storage):281:set_active] GeorepStatus: Worker Status >>> Change status=Active >>> [2020-10-01 20:52:43.561920] I [gsyncdstatus(worker >>> /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus: >>> Crawl Status Change???? status=History Crawl >>> [2020-10-01 20:52:43.562184] I [master(worker >>> /data/storage_b/storage):1554:crawl] _GMaster: starting history >>> crawl???? turns=1 stime=None????? entry_stime=None etime=1601585563 >>> [2020-10-01 20:52:43.562269] I [resource(worker >>> /data/storage_b/storage):1307:service_loop] GLUSTER: No stime >>> available, >>> using xsync crawl >>> [2020-10-01 20:52:43.569799] I [master(worker >>> /data/storage_b/storage):1670:crawl] _GMaster: starting hybrid >>> crawl????? stime=None >>> [2020-10-01 20:52:43.573528] I [gsyncdstatus(worker >>> /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus: >>> Crawl Status Change???? status=Hybrid Crawl >>> [2020-10-01 20:52:44.370985] I [master(worker >>> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken >>> duration=20.4307??????? num_files=2609? job=5?? return_code=23 >>> [2020-10-01 20:52:49.431854] W [master(worker >>> /data/storage_c/storage):1393:process] _GMaster: incomplete sync, >>> retrying changelogs???? files=['XSYNC-CHANGELOG.1601585397'] >>> [2020-10-01 20:52:54.801500] I [master(worker >>> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken >>> duration=72.7492??????? num_files=4227? job=2?? return_code=23 >>> [2020-10-01 20:52:56.766547] I [master(worker >>> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken >>> duration=74.3569??????? num_files=4674? job=5?? return_code=23 >>> [2020-10-01 20:53:18.853333] I [master(worker >>> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken >>> duration=28.7125??????? num_files=4397? job=3?? return_code=23 >>> [2020-10-01 20:53:21.224921] W [master(worker >>> /data/storage_a/storage):1393:process] _GMaster: incomplete sync, >>> retrying changelogs???? files=['CHANGELOG.1601044033', >>> 'CHANGELOG.1601044048', 'CHANGELOG.1601044063', 'CHANGELOG.1601044078', >>> 'CHANGELOG.1601044093', 'CHANGELOG.1601044108', 'CHANGELOG.1601044123'] >>> [2020-10-01 20:53:22.134536] I [master(worker >>> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken >>> duration=0.2159 num_files=3???? job=3?? return_code=23 >>> [2020-10-01 20:53:25.615712] I [master(worker >>> /data/storage_b/storage):1681:crawl] _GMaster: processing xsync >>> changelog >>> path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage/xsync/XSYNC-CHANGELOG.1601585563 >>> >>> [2020-10-01 20:53:25.634970] W [master(worker >>> /data/storage_c/storage):1393:process] _GMaster: incomplete sync, >>> retrying changelogs???? files=['XSYNC-CHANGELOG.1601585397'] >>> >>> GeoReplication status - see it change from Active to Faulty: >>> >>> [root at storage01 ~]# gluster volume geo-replication status >>> >>> MASTER NODE??? MASTER VOL??? MASTER BRICK?????????????? SLAVE USER >>> SLAVE??????????????????????????????????????? SLAVE NODE STATUS >>> CRAWL STATUS?????? LAST_SYNCED >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> 10.0.231.91??? storage?????? /data/storage_a/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.81??? Active >>> Changelog Crawl??? 2020-09-25 07:26:57 >>> 10.0.231.91??? storage?????? /data/storage_c/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.82??? Active >>> Hybrid Crawl?????? N/A >>> 10.0.231.91??? storage?????? /data/storage_b/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.82??? Active >>> Hybrid Crawl?????? N/A >>> 10.0.231.92??? storage?????? /data/storage_b/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.82??? Active >>> History Crawl????? 2020-09-23 01:56:05 >>> 10.0.231.92??? storage?????? /data/storage_a/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.82??? Active >>> Hybrid Crawl?????? N/A >>> 10.0.231.92??? storage?????? /data/storage_c/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.81??? Active >>> Hybrid Crawl?????? N/A >>> 10.0.231.93??? storage?????? /data/storage_c/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.81??? Active >>> Changelog Crawl??? 2020-09-25 06:55:57 >>> 10.0.231.93??? storage?????? /data/storage_b/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.81??? Active >>> Hybrid Crawl?????? N/A >>> 10.0.231.93??? storage?????? /data/storage_a/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.81??? Active >>> Hybrid Crawl?????? N/A >>> >>> [root at storage01 ~]# gluster volume geo-replication status >>> >>> MASTER NODE??? MASTER VOL??? MASTER BRICK?????????????? SLAVE USER >>> SLAVE??????????????????????????????????????? SLAVE NODE STATUS >>> CRAWL STATUS?????? LAST_SYNCED >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> 10.0.231.91??? storage?????? /data/storage_a/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.81??? Active >>> Changelog Crawl??? 2020-09-25 07:26:57 >>> 10.0.231.91??? storage?????? /data/storage_c/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.82??? Active >>> Hybrid Crawl?????? N/A >>> 10.0.231.91??? storage?????? /data/storage_b/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup N/A??????????? Faulty >>> N/A??????????????? N/A >>> 10.0.231.92??? storage?????? /data/storage_b/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.82??? Active >>> History Crawl????? 2020-09-23 01:58:05 >>> 10.0.231.92??? storage?????? /data/storage_a/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.82??? Active >>> Hybrid Crawl?????? N/A >>> 10.0.231.92??? storage?????? /data/storage_c/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup N/A??????????? Faulty >>> N/A??????????????? N/A >>> 10.0.231.93??? storage?????? /data/storage_c/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.81??? Active >>> Changelog Crawl??? 2020-09-25 06:58:56 >>> 10.0.231.93??? storage?????? /data/storage_b/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup 10.0.231.81??? Active >>> Hybrid Crawl?????? N/A >>> 10.0.231.93??? storage?????? /data/storage_a/storage geoaccount >>> ssh://geoaccount at 10.0.231.81::pcic-backup N/A??????????? Faulty >>> N/A??????????????? N/A >>> >>> >>> Cluster information: (Note - disabled performance.open-behind to work >>> around https://github.com/gluster/glusterfs/issues/1440 ) >>> >>> [root at storage01 ~]# gluster --version | head -1; cat >>> /etc/centos-release; uname -r >>> glusterfs 7.7 >>> CentOS Linux release 7.8.2003 (Core) >>> 3.10.0-1127.10.1.el7.x86_64 >>> >>> [root at storage01 ~]# df -h /storage2/ >>> Filesystem??????????? Size? Used Avail Use% Mounted on >>> 10.0.231.91:/storage? 328T? 228T? 100T? 70% /storage2 >>> >>> [root at storage01 ~]# gluster volume info >>> >>> Volume Name: storage >>> Type: Distributed-Replicate >>> Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 3 x (2 + 1) = 9 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 10.0.231.91:/data/storage_a/storage >>> Brick2: 10.0.231.92:/data/storage_b/storage >>> Brick3: 10.0.231.93:/data/storage_c/storage (arbiter) >>> Brick4: 10.0.231.92:/data/storage_a/storage >>> Brick5: 10.0.231.93:/data/storage_b/storage >>> Brick6: 10.0.231.91:/data/storage_c/storage (arbiter) >>> Brick7: 10.0.231.93:/data/storage_a/storage >>> Brick8: 10.0.231.91:/data/storage_b/storage >>> Brick9: 10.0.231.92:/data/storage_c/storage (arbiter) >>> Options Reconfigured: >>> changelog.changelog: on >>> geo-replication.ignore-pid-check: on >>> geo-replication.indexing: on >>> network.ping-timeout: 10 >>> features.inode-quota: on >>> features.quota: on >>> nfs.disable: on >>> features.quota-deem-statfs: on >>> storage.fips-mode-rchecksum: on >>> performance.readdir-ahead: on >>> performance.parallel-readdir: on >>> cluster.lookup-optimize: on >>> client.event-threads: 4 >>> server.event-threads: 4 >>> performance.cache-size: 256MB >>> performance.open-behind: off >>> >>> Thanks, >>> ??-Matthew >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201005/1af661b5/attachment.html>