Viktor Nosov
2016-Dec-06 01:43 UTC
[Gluster-users] Geo-replication failed to delete from slave file partially written to master volume.
Hi, I hit problem while testing geo-replication. Anybody knows how to fix it except deleting and recreating geo-replication? Geo-replication failed to delete from slave file partially written to master volume. Have geo-replication between two nodes that are running glusterfs 3.7.16 with master volume: [root at SC-182 log]# gluster volume info master-for-183-0003 Volume Name: master-for-183-0003 Type: Distribute Volume ID: 84501a83-b07c-4768-bfaa-418b038e1a9e Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.10.60.182:/exports/nas-segment-0012/master-for-183-0003 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on server.allow-insecure: on performance.quick-read: off performance.stat-prefetch: off nfs.disable: on nfs.addr-namelookup: off performance.readdir-ahead: on cluster.enable-shared-storage: enable snap-activate-on-create: enable and slave volume: [root at SC-183 log]# gluster volume info rem-volume-0001 Volume Name: rem-volume-0001 Type: Distribute Volume ID: 7680de7a-d0e2-42f2-96a9-4da29adba73c Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.10.60.183:/exports/nas183-segment-0001/rem-volume-0001 Options Reconfigured: performance.readdir-ahead: on nfs.addr-namelookup: off nfs.disable: on performance.stat-prefetch: off performance.quick-read: off server.allow-insecure: on snap-activate-on-create: enable Master volume mounted on node: [root at SC-182 log]# mount 127.0.0.1:/master-for-183-0003 on /samba/master-for-183-0003 type fuse.glusterfs (rw,allow_other,max_read=131072) Let's fill up space on master volume: [root at SC-182 log]# mkdir /samba/master-for-183-0003/cifs_share/dir3 [root at SC-182 log]# cp big.file /samba/master-for-183-0003/cifs_share/dir3/ [root at SC-182 log]# cp big.file /samba/master-for-183-0003/cifs_share/dir3/big.file.1 cp: writing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': No space left on device cp: closing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': No space left on device File " big.file.1" represent part of the original file: [root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* -rwx------ 1 root root 78930370 Dec 5 16:49 /samba/master-for-183-0003/cifs_share/dir3/big.file -rwx------ 1 root root 22155264 Dec 5 16:49 /samba/master-for-183-0003/cifs_share/dir3/big.file.1 Both new files are geo-replicated to the Slave volume successfully: [root at SC-183 log]# ls -l /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ total 98720 -rwx------ 2 root root 78930370 Dec 5 16:49 big.file -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 [root at SC-182 log]# /usr/sbin/gluster volume geo-replication master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ------ 10.10.60.182 master-for-183-0003 /exports/nas-segment-0012/master-for-183-0003 nasgorep nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active Changelog Crawl 2016-12-05 16:49:48 0 0 0 0 N/A N/A N/A Let's delete partially written file from the master mount: [root at SC-182 log]# rm /samba/master-for-183-0003/cifs_share/dir3/big.file.1 rm: remove regular file `/samba/master-for-183-0003/cifs_share/dir3/big.file.1'? y [root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* -rwx------ 1 root root 78930370 Dec 5 16:49 /samba/master-for-183-0003/cifs_share/dir3/big.file Set checkpoint: 32643 12/05/2016 16:57:46.540390536 1480985866 command: /usr/sbin/gluster volume geo-replication master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 32643 12/05/2016 16:57:48.770820909 1480985868 status=0 /usr/sbin/gluster volume geo-replication master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 Check geo-replication status: [root at SC-182 log]# /usr/sbin/gluster volume geo-replication master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ---------- 10.10.60.182 master-for-183-0003 /exports/nas-segment-0012/master-for-183-0003 nasgorep nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active Changelog Crawl 2016-12-05 16:57:48 0 0 0 0 2016-12-05 16:57:46 Yes 2016-12-05 16:57:50 But the partially written file "big.file.1" is still present on the slave volume: [root at SC-183 log]# ls -l /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ total 98720 -rwx------ 2 root root 78930370 Dec 5 16:49 big.file -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 Gluster logs for geo-replication do not have any indication about failure to delete the file: [root at SC-182 log]# view /var/log/glusterfs/geo-replication/master-for-183-0003/ssh%3A%2F%2Fnasgorep% 4010.10.60.183%3Agluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log [2016-12-06 00:49:40.267956] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 17 crawls, 1 turns [2016-12-06 00:49:52.348413] I [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster: slave's time: (1480985358, 0) [2016-12-06 00:49:53.296811] W [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 [2016-12-06 00:49:53.901186] W [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 [2016-12-06 00:49:54.760957] W [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 [2016-12-06 00:49:55.384705] W [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 [2016-12-06 00:49:55.987873] W [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 [2016-12-06 00:49:56.848361] W [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 [2016-12-06 00:49:57.471925] W [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 [2016-12-06 00:49:58.76416] W [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 [2016-12-06 00:49:58.935801] W [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 [2016-12-06 00:49:59.560571] E [resource(/exports/nas-segment-0012/master-for-183-0003):1021:rsync] SSH: SYNC Error(Rsync): rsync: rsync_xal_set: lsetxattr(".gfid/103b87ff-3b7a-4f2b-8bc5-a2f9c1d3fc0e","trusted.glusterfs.84 501a83-b07c-4768-bfaa-418b038e1a9e.xtime") failed: Operation not permitted (1) [2016-12-06 00:49:59.560972] E [master(/exports/nas-segment-0012/master-for-183-0003):1037:process] _GMaster: changelogs CHANGELOG.1480985389 could not be processed completely - moving on... [2016-12-06 00:50:41.839792] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 18 crawls, 1 turns [2016-12-06 00:51:42.203411] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 20 crawls, 0 turns [2016-12-06 00:52:42.600800] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 20 crawls, 0 turns [2016-12-06 00:53:42.983913] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 20 crawls, 0 turns [2016-12-06 00:54:43.381218] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 20 crawls, 0 turns [2016-12-06 00:55:43.749927] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 20 crawls, 0 turns [2016-12-06 00:56:44.113914] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 20 crawls, 0 turns [2016-12-06 00:57:44.494354] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 20 crawls, 0 turns [2016-12-06 00:57:48.528424] I [gsyncd(conf):671:main_i] <top>: checkpoint 1480985866 set [2016-12-06 00:57:48.528704] I [syncdutils(conf):220:finalize] <top>: exiting. [2016-12-06 00:57:50.530714] I [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster: slave's time: (1480985388, 0) [2016-12-06 00:58:44.802122] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 20 crawls, 1 turns [2016-12-06 00:59:45.181669] I [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] _GMaster: 20 crawls, 0 turns Best regards, Viktor Nosov
Kotresh Hiremath Ravishankar
2016-Dec-07 05:24 UTC
[Gluster-users] Geo-replication failed to delete from slave file partially written to master volume.
Hi Viktor, Please share geo-replication-slave mount logs from slave nodes. Thanks and Regards, Kotresh H R ----- Original Message -----> From: "Viktor Nosov" <vnosov at stonefly.com> > To: gluster-users at gluster.org > Cc: vnosov at stonefly.com > Sent: Tuesday, December 6, 2016 7:13:22 AM > Subject: [Gluster-users] Geo-replication failed to delete from slave file partially written to master volume. > > Hi, > > I hit problem while testing geo-replication. Anybody knows how to fix it > except deleting and recreating geo-replication? > > Geo-replication failed to delete from slave file partially written to master > volume. > > Have geo-replication between two nodes that are running glusterfs 3.7.16 > > with master volume: > > [root at SC-182 log]# gluster volume info master-for-183-0003 > > Volume Name: master-for-183-0003 > Type: Distribute > Volume ID: 84501a83-b07c-4768-bfaa-418b038e1a9e > Status: Started > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 10.10.60.182:/exports/nas-segment-0012/master-for-183-0003 > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > server.allow-insecure: on > performance.quick-read: off > performance.stat-prefetch: off > nfs.disable: on > nfs.addr-namelookup: off > performance.readdir-ahead: on > cluster.enable-shared-storage: enable > snap-activate-on-create: enable > > and slave volume: > > [root at SC-183 log]# gluster volume info rem-volume-0001 > > Volume Name: rem-volume-0001 > Type: Distribute > Volume ID: 7680de7a-d0e2-42f2-96a9-4da29adba73c > Status: Started > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 10.10.60.183:/exports/nas183-segment-0001/rem-volume-0001 > Options Reconfigured: > performance.readdir-ahead: on > nfs.addr-namelookup: off > nfs.disable: on > performance.stat-prefetch: off > performance.quick-read: off > server.allow-insecure: on > snap-activate-on-create: enable > > Master volume mounted on node: > > [root at SC-182 log]# mount > 127.0.0.1:/master-for-183-0003 on /samba/master-for-183-0003 type > fuse.glusterfs (rw,allow_other,max_read=131072) > > Let's fill up space on master volume: > > [root at SC-182 log]# mkdir /samba/master-for-183-0003/cifs_share/dir3 > [root at SC-182 log]# cp big.file /samba/master-for-183-0003/cifs_share/dir3/ > [root at SC-182 log]# cp big.file > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > cp: writing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': No > space left on device > cp: closing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': No > space left on device > > File " big.file.1" represent part of the original file: > [root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* > -rwx------ 1 root root 78930370 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file > -rwx------ 1 root root 22155264 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > > Both new files are geo-replicated to the Slave volume successfully: > > [root at SC-183 log]# ls -l > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > total 98720 > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > [root at SC-182 log]# /usr/sbin/gluster volume geo-replication > master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status detail > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER SLAVE SLAVE NODE > STATUS > CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ------ > 10.10.60.182 master-for-183-0003 > /exports/nas-segment-0012/master-for-183-0003 nasgorep > nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active > Changelog Crawl 2016-12-05 16:49:48 0 0 0 0 > N/A N/A N/A > > Let's delete partially written file from the master mount: > > [root at SC-182 log]# rm /samba/master-for-183-0003/cifs_share/dir3/big.file.1 > rm: remove regular file > `/samba/master-for-183-0003/cifs_share/dir3/big.file.1'? y > > [root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/* > -rwx------ 1 root root 78930370 Dec 5 16:49 > /samba/master-for-183-0003/cifs_share/dir3/big.file > > Set checkpoint: > > 32643 12/05/2016 16:57:46.540390536 1480985866 command: /usr/sbin/gluster > volume geo-replication master-for-183-0003 > nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > 32643 12/05/2016 16:57:48.770820909 1480985868 status=0 /usr/sbin/gluster > volume geo-replication master-for-183-0003 > nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1 > > Check geo-replication status: > > [root at SC-182 log]# /usr/sbin/gluster volume geo-replication > master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status detail > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER SLAVE SLAVE NODE > STATUS > CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES > CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ---------- > 10.10.60.182 master-for-183-0003 > /exports/nas-segment-0012/master-for-183-0003 nasgorep > nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active > Changelog Crawl 2016-12-05 16:57:48 0 0 0 0 > 2016-12-05 16:57:46 Yes 2016-12-05 16:57:50 > > But the partially written file "big.file.1" is still present on the slave > volume: > > [root at SC-183 log]# ls -l > /exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/ > total 98720 > -rwx------ 2 root root 78930370 Dec 5 16:49 big.file > -rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1 > > Gluster logs for geo-replication do not have any indication about failure to > delete the file: > > [root at SC-182 log]# view > /var/log/glusterfs/geo-replication/master-for-183-0003/ssh%3A%2F%2Fnasgorep% > 4010.10.60.183%3Agluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log > > [2016-12-06 00:49:40.267956] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 17 crawls, 1 turns > [2016-12-06 00:49:52.348413] I > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster: > slave's time: (1480985358, 0) > [2016-12-06 00:49:53.296811] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:53.901186] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:54.760957] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:55.384705] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:55.987873] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:56.848361] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:57.471925] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:58.76416] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:58.935801] W > [master(/exports/nas-segment-0012/master-for-183-0003):1058:process] > _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389 > [2016-12-06 00:49:59.560571] E > [resource(/exports/nas-segment-0012/master-for-183-0003):1021:rsync] SSH: > SYNC Error(Rsync): rsync: rsync_xal_set: > lsetxattr(".gfid/103b87ff-3b7a-4f2b-8bc5-a2f9c1d3fc0e","trusted.glusterfs.84 > 501a83-b07c-4768-bfaa-418b038e1a9e.xtime") failed: Operation not permitted > (1) > [2016-12-06 00:49:59.560972] E > [master(/exports/nas-segment-0012/master-for-183-0003):1037:process] > _GMaster: changelogs CHANGELOG.1480985389 could not be processed > completely - moving on... > [2016-12-06 00:50:41.839792] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 18 crawls, 1 turns > [2016-12-06 00:51:42.203411] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:52:42.600800] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:53:42.983913] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:54:43.381218] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:55:43.749927] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:56:44.113914] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:57:44.494354] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > [2016-12-06 00:57:48.528424] I [gsyncd(conf):671:main_i] <top>: checkpoint > 1480985866 set > [2016-12-06 00:57:48.528704] I [syncdutils(conf):220:finalize] <top>: > exiting. > [2016-12-06 00:57:50.530714] I > [master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster: > slave's time: (1480985388, 0) > [2016-12-06 00:58:44.802122] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 1 turns > [2016-12-06 00:59:45.181669] I > [master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap] > _GMaster: 20 crawls, 0 turns > > Best regards, > > Viktor Nosov > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >