Kotresh Hiremath Ravishankar
2016-Dec-07 05:24 UTC
[Gluster-users] Geo-replication failed to delete from slave file partially written to master volume.
Hi Viktor, Please share geo-replication-slave mount logs from slave nodes. Thanks and Regards, Kotresh H R ----- Original Message -----> From: "Viktor Nosov" <vnosov at stonefly.com> > To: gluster-users at gluster.org > Cc: vnosov at stonefly.com > Sent: Tuesday, December 6, 2016 7:13:22 AM > Subject: [Gluster-users] Geo-replication failed to delete from slave file partially written to master volume. > > Hi, > > I hit problem while testing geo-replication. Anybody knows how to fix it > except deleting and recreating geo-replication? > > Geo-replication failed to delete from slave file partially written to master > volume. > > Have geo-replication between two nodes that are running glusterfs 3.7.16 > > with master volume: > > [root at SC-182 log]# gluster volume info master-for-183-0003 > > Volume Name: master-for-183-0003 > Type: Distribute > Volume ID: 84501a83-b07c-4768-bfaa-418b038e1a9e > Status: Started > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 10.10.60.182:/exports/nas-segment-0012/master-for-183-0003 > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > server.allow-insecure: on > performance.quick-read: off > performance.stat-prefetch: off > nfs.disable: on > nfs.addr-namelookup: off > performance.readdir-ahead: on > cluster.enable-shared-storage: enable > snap-activate-on-create: enable > > and slave volume: > > [root at SC-183 log]# gluster volume info rem-volume-0001 > > Volume Name: rem-volume-0001 > Type: Distribute > Volume ID: 7680de7a-d0e2-42f2-96a9-4da29adba73c > Status: Started > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 10.10.60.183:/exports/nas183-segment-0001/rem-volume-0001 > Options Reconfigured: > performance.readdir-ahead: on > nfs.addr-namelookup: off > nfs.disable: on > performance.stat-prefetch: off > performance.quick-read: off > server.allow-insecure: on > snap-activate-on-create: enable > > Master volume mounted on node: > > [root at SC-182 log]# mount > 127.0.0.1:/master-for-183-0003 on /samba/master-for-183-0003 type > fuse.glusterfs (rw,allow_other,max_read=131072) > > Let's fill up space on master volume: > > [root at SC-182 log]# mkdir /samba/master-for-183-0003/cifs_share/dir3 > [root at SC-182 log]# cp big.file /samba/master-for-183-0003/cifs_share/dir3/ > [root at SC-182 log]# cp big.file > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > cp: writing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': No > space left on device > cp: closing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': No > space left on device > > File " big.file.1" represent part of the original file: > [root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* > -rwx------ 1 root root 78930370 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file > -rwx------ 1 root root 22155264 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > > Both new files are geo-replicated to the Slave volume successfully: > > [root at SC-183 log]# ls -l > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > total 98720 > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > [root at SC-182 log]# /usr/sbin/gluster volume geo-replication > master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status detail > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER SLAVE SLAVE NODE > STATUS > CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ------ > 10.10.60.182 master-for-183-0003 > /exports/nas-segment-0012/master-for-183-0003 nasgorep > nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active > Changelog Crawl 2016-12-05 16:49:48 0 0 0 0 > N/A N/A N/A > > Let's delete partially written file from the master mount: > > [root at SC-182 log]# rm /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > rm: remove regular file > `/samba/master-for-183-0003/cifs_share/dir3/big.file.1'? y > > [root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* > -rwx------ 1 root root 78930370 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file > > Set checkpoint: > > 32643 12/05/2016 16:57:46.540390536 1480985866 command: /usr/sbin/gluster > volume geo-replication master-for-183-0003 > nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > 32643 12/05/2016 16:57:48.770820909 1480985868 status=0 /usr/sbin/gluster > volume geo-replication master-for-183-0003 > nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > > Check geo-replication status: > > [root at SC-182 log]# /usr/sbin/gluster volume geo-replication > master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status detail > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER SLAVE SLAVE NODE > STATUS > CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ---------- > 10.10.60.182 master-for-183-0003 > /exports/nas-segment-0012/master-for-183-0003 nasgorep > nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active > Changelog Crawl 2016-12-05 16:57:48 0 0 0 0 > 2016-12-05 16:57:46 Yes 2016-12-05 16:57:50 > > But the partially written file "big.file.1" is still present on the slave > volume: > > [root at SC-183 log]# ls -l > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > total 98720 > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > Gluster logs for geo-replication do not have any indication about failure to > delete the file: > > [root at SC-182 log]# view > /var/log/glusterfs/geo-replication/master-for-183-0003/ssh%3A%2F%2Fnasgorep% > 4010.10.60.183%3Agluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log > > [2016-12-06 00:49:40.267956] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 17 crawls, 1 turns > [2016-12-06 00:49:52.348413] I > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster: > slave's time: (1480985358, 0) > [2016-12-06 00:49:53.296811] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:53.901186] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:54.760957] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:55.384705] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:55.987873] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:56.848361] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:57.471925] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:58.76416] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:58.935801] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:59.560571] E > [resource(/exports/nas-segment-0012/master-for-183-0003):1021:rsync] SSH: > SYNC Error(Rsync): rsync: rsync_xal_set: > lsetxattr(".gfid/103b87ff-3b7a-4f2b-8bc5-a2f9c1d3fc0e","trusted.glusterfs.84 > 501a83-b07c-4768-bfaa-418b038e1a9e.xtime") failed: Operation not permitted > (1) > [2016-12-06 00:49:59.560972] E > [master(/exports/nas-segment-0012/master-for-183-0003):1037:process] > _GMaster: changelogs CHANGELOG.1480985389 could not be processed > completely - moving on... > [2016-12-06 00:50:41.839792] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 18 crawls, 1 turns > [2016-12-06 00:51:42.203411] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:52:42.600800] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:53:42.983913] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:54:43.381218] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:55:43.749927] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:56:44.113914] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:57:44.494354] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:57:48.528424] I [gsyncd(conf):671:main_i] <top>: checkpoint > 1480985866 set > [2016-12-06 00:57:48.528704] I [syncdutils(conf):220:finalize] <top>: > exiting. > [2016-12-06 00:57:50.530714] I > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster: > slave's time: (1480985388, 0) > [2016-12-06 00:58:44.802122] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 1 turns > [2016-12-06 00:59:45.181669] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > > Best regards, > > Viktor Nosov > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
Viktor Nosov
2016-Dec-07 17:18 UTC
[Gluster-users] Geo-replication failed to delete from slave file partially written to master volume.
Hi Kotresh, Thanks for looking into this issue! I'm attaching log files from the slave node from /var/log/glusterfs/geo-replication-slaves/ [root at SC-183 log]# cp /var/log/glusterfs/geo-replication-slaves/84501a83-b07c-4768-bfaa-418b038e1a9e\:gluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.gluster.log /home/vnosov/ [root at SC-183 log]# cp /var/log/glusterfs/geo-replication-slaves/slave.log /home/vnosov/ [root at SC-183 log]# cp /var/log/glusterfs/geo-replication-slaves/mbr/84501a83-b07c-4768-bfaa-418b038e1a9e\:gluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log /home/vnosov/ Best regards, Viktor Nosov -----Original Message----- From: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com] Sent: Tuesday, December 06, 2016 9:25 PM To: Viktor Nosov Cc: gluster-users at gluster.org Subject: Re: [Gluster-users] Geo-replication failed to delete from slave file partially written to master volume. Hi Viktor, Please share geo-replication-slave mount logs from slave nodes. Thanks and Regards, Kotresh H R ----- Original Message -----> From: "Viktor Nosov" <vnosov at stonefly.com> > To: gluster-users at gluster.org > Cc: vnosov at stonefly.com > Sent: Tuesday, December 6, 2016 7:13:22 AM > Subject: [Gluster-users] Geo-replication failed to delete from slave file partially written to master volume. > > Hi, > > I hit problem while testing geo-replication. Anybody knows how to fix > it except deleting and recreating geo-replication? > > Geo-replication failed to delete from slave file partially written to > master volume. > > Have geo-replication between two nodes that are running glusterfs > 3.7.16 > > with master volume: > > [root at SC-182 log]# gluster volume info master-for-183-0003 > > Volume Name: master-for-183-0003 > Type: Distribute > Volume ID: 84501a83-b07c-4768-bfaa-418b038e1a9e > Status: Started > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 10.10.60.182:/exports/nas-segment-0012/master-for-183-0003 > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > server.allow-insecure: on > performance.quick-read: off > performance.stat-prefetch: off > nfs.disable: on > nfs.addr-namelookup: off > performance.readdir-ahead: on > cluster.enable-shared-storage: enable > snap-activate-on-create: enable > > and slave volume: > > [root at SC-183 log]# gluster volume info rem-volume-0001 > > Volume Name: rem-volume-0001 > Type: Distribute > Volume ID: 7680de7a-d0e2-42f2-96a9-4da29adba73c > Status: Started > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 10.10.60.183:/exports/nas183-segment-0001/rem-volume-0001 > Options Reconfigured: > performance.readdir-ahead: on > nfs.addr-namelookup: off > nfs.disable: on > performance.stat-prefetch: off > performance.quick-read: off > server.allow-insecure: on > snap-activate-on-create: enable > > Master volume mounted on node: > > [root at SC-182 log]# mount > 127.0.0.1:/master-for-183-0003 on /samba/master-for-183-0003 type > fuse.glusterfs (rw,allow_other,max_read=131072) > > Let's fill up space on master volume: > > [root at SC-182 log]# mkdir /samba/master-for-183-0003/cifs_share/dir3 > [root at SC-182 log]# cp big.file > /samba/master-for-183-0003/cifs_share/dir3/ > [root at SC-182 log]# cp big.file > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > cp: writing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': > No space left on device > cp: closing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': > No space left on device > > File " big.file.1" represent part of the original file: > [root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* > -rwx------ 1 root root 78930370 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file > -rwx------ 1 root root 22155264 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > > Both new files are geo-replicated to the Slave volume successfully: > > [root at SC-183 log]# ls -l > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > total 98720 > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > [root at SC-182 log]# /usr/sbin/gluster volume geo-replication > master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status > detail > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER SLAVE SLAVE NODE > STATUS > CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ------ > 10.10.60.182 master-for-183-0003 > /exports/nas-segment-0012/master-for-183-0003 nasgorep > nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active > Changelog Crawl 2016-12-05 16:49:48 0 0 0 0 > N/A N/A N/A > > Let's delete partially written file from the master mount: > > [root at SC-182 log]# rm > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > rm: remove regular file > `/samba/master-for-183-0003/cifs_share/dir3/big.file.1'? y > > [root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* > -rwx------ 1 root root 78930370 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file > > Set checkpoint: > > 32643 12/05/2016 16:57:46.540390536 1480985866 command: > /usr/sbin/gluster volume geo-replication master-for-183-0003 > nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > 32643 12/05/2016 16:57:48.770820909 1480985868 status=0 > /usr/sbin/gluster volume geo-replication master-for-183-0003 > nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > > Check geo-replication status: > > [root at SC-182 log]# /usr/sbin/gluster volume geo-replication > master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status > detail > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER SLAVE SLAVE NODE > STATUS > CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ---------------------------------------------------------------------- > ------ > ---------- > 10.10.60.182 master-for-183-0003 > /exports/nas-segment-0012/master-for-183-0003 nasgorep > nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active > Changelog Crawl 2016-12-05 16:57:48 0 0 0 0 > 2016-12-05 16:57:46 Yes 2016-12-05 16:57:50 > > But the partially written file "big.file.1" is still present on the > slave > volume: > > [root at SC-183 log]# ls -l > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > total 98720 > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > Gluster logs for geo-replication do not have any indication about > failure to delete the file: > > [root at SC-182 log]# view > /var/log/glusterfs/geo-replication/master-for-183-0003/ssh%3A%2F%2Fnas > gorep% > 4010.10.60.183%3Agluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log > > [2016-12-06 00:49:40.267956] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 17 crawls, 1 turns > [2016-12-06 00:49:52.348413] I > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster: > slave's time: (1480985358, 0) > [2016-12-06 00:49:53.296811] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:53.901186] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:54.760957] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:55.384705] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:55.987873] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:56.848361] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:57.471925] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:58.76416] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:58.935801] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:59.560571] E > [resource(/exports/nas-segment-0012/master-for-183-0003):1021:rsync] SSH: > SYNC Error(Rsync): rsync: rsync_xal_set: > lsetxattr(".gfid/103b87ff-3b7a-4f2b-8bc5-a2f9c1d3fc0e","trusted.gluste > rfs.84 > 501a83-b07c-4768-bfaa-418b038e1a9e.xtime") failed: Operation not > permitted > (1) > [2016-12-06 00:49:59.560972] E > [master(/exports/nas-segment-0012/master-for-183-0003):1037:process] > _GMaster: changelogs CHANGELOG.1480985389 could not be processed > completely - moving on... > [2016-12-06 00:50:41.839792] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 18 crawls, 1 turns > [2016-12-06 00:51:42.203411] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:52:42.600800] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:53:42.983913] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:54:43.381218] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:55:43.749927] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:56:44.113914] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:57:44.494354] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:57:48.528424] I [gsyncd(conf):671:main_i] <top>: > checkpoint > 1480985866 set > [2016-12-06 00:57:48.528704] I [syncdutils(conf):220:finalize] <top>: > exiting. > [2016-12-06 00:57:50.530714] I > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster: > slave's time: (1480985388, 0) > [2016-12-06 00:58:44.802122] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 1 turns > [2016-12-06 00:59:45.181669] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > > Best regards, > > Viktor Nosov > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- A non-text attachment was scrubbed... Name: 84LPUG~L.LOG Type: application/octet-stream Size: 200 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161207/3c74a181/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: slave.log Type: application/octet-stream Size: 225810 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161207/3c74a181/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 8RSDWG~T.LOG Type: application/octet-stream Size: 9392 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161207/3c74a181/attachment-0002.obj>
Kotresh Hiremath Ravishankar
2016-Dec-09 11:03 UTC
[Gluster-users] Geo-replication failed to delete from slave file partially written to master volume.
Hi Viktor, I went through the slave logs and there are no errors w.r.t deletion of the files. I suspect the changelog has missed recording deletion of file because of which the delete is not processed. Please share the following logs from master volume to root cause the issue. 1. Master geo-replication logs: /var/log/glusterfs/geo-replication/<mastervolname>/*.log 2. Master brick logs: /var/log/glusterfs/bricks/*.log 3. Also changelogs from Master volume: /exports/nas-segment-0012/master-for-183-0003/.glusterfs/changelogs/* Thanks and Regards, Kotresh H R ----- Original Message -----> From: "Viktor Nosov" <vnosov at stonefly.com> > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > Cc: gluster-users at gluster.org > Sent: Wednesday, December 7, 2016 10:48:52 PM > Subject: RE: [Gluster-users] Geo-replication failed to delete from slave file partially written to master volume. > > Hi Kotresh, > > Thanks for looking into this issue! > I'm attaching log files from the slave node from > /var/log/glusterfs/geo-replication-slaves/ > > [root at SC-183 log]# cp > /var/log/glusterfs/geo-replication-slaves/84501a83-b07c-4768-bfaa-418b038e1a9e\:gluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.gluster.log > /home/vnosov/ > [root at SC-183 log]# cp /var/log/glusterfs/geo-replication-slaves/slave.log > /home/vnosov/ > [root at SC-183 log]# cp > /var/log/glusterfs/geo-replication-slaves/mbr/84501a83-b07c-4768-bfaa-418b038e1a9e\:gluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log > /home/vnosov/ > > Best regards, > > Viktor Nosov > > > -----Original Message----- > From: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com] > Sent: Tuesday, December 06, 2016 9:25 PM > To: Viktor Nosov > Cc: gluster-users at gluster.org > Subject: Re: [Gluster-users] Geo-replication failed to delete from slave file > partially written to master volume. > > Hi Viktor, > > Please share geo-replication-slave mount logs from slave nodes. > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > > From: "Viktor Nosov" <vnosov at stonefly.com> > > To: gluster-users at gluster.org > > Cc: vnosov at stonefly.com > > Sent: Tuesday, December 6, 2016 7:13:22 AM > > Subject: [Gluster-users] Geo-replication failed to delete from slave file > > partially written to master volume. > > > > Hi, > > > > I hit problem while testing geo-replication. Anybody knows how to fix > > it except deleting and recreating geo-replication? > > > > Geo-replication failed to delete from slave file partially written to > > master volume. > > > > Have geo-replication between two nodes that are running glusterfs > > 3.7.16 > > > > with master volume: > > > > [root at SC-182 log]# gluster volume info master-for-183-0003 > > > > Volume Name: master-for-183-0003 > > Type: Distribute > > Volume ID: 84501a83-b07c-4768-bfaa-418b038e1a9e > > Status: Started > > Number of Bricks: 1 > > Transport-type: tcp > > Bricks: > > Brick1: 10.10.60.182:/exports/nas-segment-0012/master-for-183-0003 > > Options Reconfigured: > > changelog.changelog: on > > geo-replication.ignore-pid-check: on > > geo-replication.indexing: on > > server.allow-insecure: on > > performance.quick-read: off > > performance.stat-prefetch: off > > nfs.disable: on > > nfs.addr-namelookup: off > > performance.readdir-ahead: on > > cluster.enable-shared-storage: enable > > snap-activate-on-create: enable > > > > and slave volume: > > > > [root at SC-183 log]# gluster volume info rem-volume-0001 > > > > Volume Name: rem-volume-0001 > > Type: Distribute > > Volume ID: 7680de7a-d0e2-42f2-96a9-4da29adba73c > > Status: Started > > Number of Bricks: 1 > > Transport-type: tcp > > Bricks: > > Brick1: 10.10.60.183:/exports/nas183-segment-0001/rem-volume-0001 > > Options Reconfigured: > > performance.readdir-ahead: on > > nfs.addr-namelookup: off > > nfs.disable: on > > performance.stat-prefetch: off > > performance.quick-read: off > > server.allow-insecure: on > > snap-activate-on-create: enable > > > > Master volume mounted on node: > > > > [root at SC-182 log]# mount > > 127.0.0.1:/master-for-183-0003 on /samba/master-for-183-0003 type > > fuse.glusterfs (rw,allow_other,max_read=131072) > > > > Let's fill up space on master volume: > > > > [root at SC-182 log]# mkdir /samba/master-for-183-0003/cifs_share/dir3 > > [root at SC-182 log]# cp big.file > > /samba/master-for-183-0003/cifs_share/dir3/ > > [root at SC-182 log]# cp big.file > > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > > cp: writing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': > > No space left on device > > cp: closing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': > > No space left on device > > > > File " big.file.1" represent part of the original file: > > [root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* > > -rwx------ 1 root root 78930370 Dec 5 16:49 > > /samba/master-for-183-0003/cifs_share/dir3/big.file > > -rwx------ 1 root root 22155264 Dec 5 16:49 > > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > > > > Both new files are geo-replicated to the Slave volume successfully: > > > > [root at SC-183 log]# ls -l > > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > > total 98720 > > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > > > [root at SC-182 log]# /usr/sbin/gluster volume geo-replication > > master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status > > detail > > > > MASTER NODE MASTER VOL MASTER BRICK > > SLAVE USER SLAVE SLAVE NODE > > STATUS > > CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES > > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > > ---------------------------------------------------------------------- > > ------ > > ---------------------------------------------------------------------- > > ------ > > ---------------------------------------------------------------------- > > ------ > > ---------------------------------------------------------------------- > > ------ > > ------ > > 10.10.60.182 master-for-183-0003 > > /exports/nas-segment-0012/master-for-183-0003 nasgorep > > nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active > > Changelog Crawl 2016-12-05 16:49:48 0 0 0 0 > > N/A N/A N/A > > > > Let's delete partially written file from the master mount: > > > > [root at SC-182 log]# rm > > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > > rm: remove regular file > > `/samba/master-for-183-0003/cifs_share/dir3/big.file.1'? y > > > > [root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* > > -rwx------ 1 root root 78930370 Dec 5 16:49 > > /samba/master-for-183-0003/cifs_share/dir3/big.file > > > > Set checkpoint: > > > > 32643 12/05/2016 16:57:46.540390536 1480985866 command: > > /usr/sbin/gluster volume geo-replication master-for-183-0003 > > nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > > 32643 12/05/2016 16:57:48.770820909 1480985868 status=0 > > /usr/sbin/gluster volume geo-replication master-for-183-0003 > > nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > > > > Check geo-replication status: > > > > [root at SC-182 log]# /usr/sbin/gluster volume geo-replication > > master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status > > detail > > > > MASTER NODE MASTER VOL MASTER BRICK > > SLAVE USER SLAVE SLAVE NODE > > STATUS > > CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES > > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > > ---------------------------------------------------------------------- > > ------ > > ---------------------------------------------------------------------- > > ------ > > ---------------------------------------------------------------------- > > ------ > > ---------------------------------------------------------------------- > > ------ > > ---------- > > 10.10.60.182 master-for-183-0003 > > /exports/nas-segment-0012/master-for-183-0003 nasgorep > > nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active > > Changelog Crawl 2016-12-05 16:57:48 0 0 0 0 > > 2016-12-05 16:57:46 Yes 2016-12-05 16:57:50 > > > > But the partially written file "big.file.1" is still present on the > > slave > > volume: > > > > [root at SC-183 log]# ls -l > > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > > total 98720 > > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > > > Gluster logs for geo-replication do not have any indication about > > failure to delete the file: > > > > [root at SC-182 log]# view > > /var/log/glusterfs/geo-replication/master-for-183-0003/ssh%3A%2F%2Fnas > > gorep% > > 4010.10.60.183%3Agluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log > > > > [2016-12-06 00:49:40.267956] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 17 crawls, 1 turns > > [2016-12-06 00:49:52.348413] I > > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] > > _GMaster: > > slave's time: (1480985358, 0) > > [2016-12-06 00:49:53.296811] W > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > [2016-12-06 00:49:53.901186] W > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > [2016-12-06 00:49:54.760957] W > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > [2016-12-06 00:49:55.384705] W > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > [2016-12-06 00:49:55.987873] W > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > [2016-12-06 00:49:56.848361] W > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > [2016-12-06 00:49:57.471925] W > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > [2016-12-06 00:49:58.76416] W > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > [2016-12-06 00:49:58.935801] W > > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > > [2016-12-06 00:49:59.560571] E > > [resource(/exports/nas-segment-0012/master-for-183-0003):1021:rsync] SSH: > > SYNC Error(Rsync): rsync: rsync_xal_set: > > lsetxattr(".gfid/103b87ff-3b7a-4f2b-8bc5-a2f9c1d3fc0e","trusted.gluste > > rfs.84 > > 501a83-b07c-4768-bfaa-418b038e1a9e.xtime") failed: Operation not > > permitted > > (1) > > [2016-12-06 00:49:59.560972] E > > [master(/exports/nas-segment-0012/master-for-183-0003):1037:process] > > _GMaster: changelogs CHANGELOG.1480985389 could not be processed > > completely - moving on... > > [2016-12-06 00:50:41.839792] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 18 crawls, 1 turns > > [2016-12-06 00:51:42.203411] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 20 crawls, 0 turns > > [2016-12-06 00:52:42.600800] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 20 crawls, 0 turns > > [2016-12-06 00:53:42.983913] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 20 crawls, 0 turns > > [2016-12-06 00:54:43.381218] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 20 crawls, 0 turns > > [2016-12-06 00:55:43.749927] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 20 crawls, 0 turns > > [2016-12-06 00:56:44.113914] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 20 crawls, 0 turns > > [2016-12-06 00:57:44.494354] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 20 crawls, 0 turns > > [2016-12-06 00:57:48.528424] I [gsyncd(conf):671:main_i] <top>: > > checkpoint > > 1480985866 set > > [2016-12-06 00:57:48.528704] I [syncdutils(conf):220:finalize] <top>: > > exiting. > > [2016-12-06 00:57:50.530714] I > > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] > > _GMaster: > > slave's time: (1480985388, 0) > > [2016-12-06 00:58:44.802122] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 20 crawls, 1 turns > > [2016-12-06 00:59:45.181669] I > > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > > _GMaster: 20 crawls, 0 turns > > > > Best regards, > > > > Viktor Nosov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > > >