Dietmar Putz
2018-Mar-12 16:43 UTC
[Gluster-users] trashcan on dist. repl. volume with geo-replication
Hello, in regard to https://bugzilla.redhat.com/show_bug.cgi?id=1434066 i have been faced to another issue when using the trashcan feature on a dist. repl. volume running a geo-replication. (gfs 3.12.6 on ubuntu 16.04.4) for e.g. removing an entire directory with subfolders : tron at gl-node1:/myvol-1/test1/b1$ rm -rf * afterwards listing files in the trashcan : tron at gl-node1:/myvol-1/test1$ ls -la /myvol-1/.trashcan/test1/b1/ leads to an outage of the geo-replication. error on master-01 and master-02 : [2018-03-12 13:37:14.827204] I [master(/brick1/mvol1):1385:crawl] _GMaster: slave's time stime=(1520861818, 0) [2018-03-12 13:37:14.835535] E [master(/brick1/mvol1):784:log_failures] _GMaster: ENTRY FAILED??? data=({'uid': 0, 'gfid': 'c38f75e3-194a-4d22-9094-50ac8f8756e7', 'gid': 0, 'mode': 16877, 'entry': '.gfid/5531bd64-ac50-462b-943e-c0bf1c52f52c/Oracle_VM_VirtualBox_Extension', 'op': 'MKDIR'}, 2, {'gfid_mismatch': False, 'dst': False}) [2018-03-12 13:37:14.835911] E [syncdutils(/brick1/mvol1):299:log_raise_exception] <top>: The above directory failed to sync. Please fix it to proceed further. both gfid's of the directories as shown in the log : brick1/mvol1/.trashcan/test1/b1 0x5531bd64ac50462b943ec0bf1c52f52c brick1/mvol1/.trashcan/test1/b1/Oracle_VM_VirtualBox_Extension 0xc38f75e3194a4d22909450ac8f8756e7 the shown directory contains just one file which is stored on gl-node3 and gl-node4 while node1 and 2 are in geo replication error. since the filesize limitation of the trashcan is obsolete i'm really interested to use the trashcan feature but i'm concerned it will interrupt the geo-replication entirely. does anybody else have been faced with this situation...any hints, workarounds... ? best regards Dietmar Putz root at gl-node1:~/tmp# gluster volume info mvol1 Volume Name: mvol1 Type: Distributed-Replicate Volume ID: a1c74931-568c-4f40-8573-dd344553e557 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gl-node1-int:/brick1/mvol1 Brick2: gl-node2-int:/brick1/mvol1 Brick3: gl-node3-int:/brick1/mvol1 Brick4: gl-node4-int:/brick1/mvol1 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on features.trash-max-filesize: 2GB features.trash: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off root at gl-node1:/myvol-1/test1# gluster volume geo-replication mvol1 gl-node5-int::mvol1 config special_sync_mode: partial gluster_log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.gluster.log ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem change_detector: changelog use_meta_volume: true session_owner: a1c74931-568c-4f40-8573-dd344553e557 state_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/monitor.status gluster_params: aux-gfid-mount acl remote_gsyncd: /nonexistent/gsyncd working_dir: /var/lib/misc/glusterfsd/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1 state_detail_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-detail.status gluster_command_dir: /usr/sbin/ pid_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/monitor.pid georep_session_working_dir: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ ssh_command_tar: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem master.stime_xattr_name: trusted.glusterfs.a1c74931-568c-4f40-8573-dd344553e557.d62bda3a-1396-492a-ad99-7c6238d93c6a.stime changelog_log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-changes.log socketdir: /var/run/gluster volume_id: a1c74931-568c-4f40-8573-dd344553e557 ignore_deletes: false state_socket_unencoded: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.socket log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.log access_mount: true root at gl-node1:/myvol-1/test1# --
Kotresh Hiremath Ravishankar
2018-Mar-13 05:38 UTC
[Gluster-users] trashcan on dist. repl. volume with geo-replication
Hi Dietmar, I am trying to understand the problem and have few questions. 1. Is trashcan enabled only on master volume? 2. Does the 'rm -rf' done on master volume synced to slave ? 3. If trashcan is disabled, the issue goes away? The geo-rep error just says the it failed to create the directory "Oracle_VM_VirtualBox_Extension" on slave. Usually this would be because of gfid mismatch but I don't see that in your case. So I am little more interested in present state of the geo-rep. Is it still throwing same errors and same failure to sync the same directory. If so does the parent 'test1/b1' exists on slave? And doing ls on trashcan should not affect geo-rep. Is there a easy reproducer for this ? Thanks, Kotresh HR On Mon, Mar 12, 2018 at 10:13 PM, Dietmar Putz <dietmar.putz at 3qsdn.com> wrote:> Hello, > > in regard to > https://bugzilla.redhat.com/show_bug.cgi?id=1434066 > i have been faced to another issue when using the trashcan feature on a > dist. repl. volume running a geo-replication. (gfs 3.12.6 on ubuntu 16.04.4) > for e.g. removing an entire directory with subfolders : > tron at gl-node1:/myvol-1/test1/b1$ rm -rf * > > afterwards listing files in the trashcan : > tron at gl-node1:/myvol-1/test1$ ls -la /myvol-1/.trashcan/test1/b1/ > > leads to an outage of the geo-replication. > error on master-01 and master-02 : > > [2018-03-12 13:37:14.827204] I [master(/brick1/mvol1):1385:crawl] > _GMaster: slave's time stime=(1520861818, 0) > [2018-03-12 13:37:14.835535] E [master(/brick1/mvol1):784:log_failures] > _GMaster: ENTRY FAILED data=({'uid': 0, 'gfid': > 'c38f75e3-194a-4d22-9094-50ac8f8756e7', 'gid': 0, 'mode': 16877, 'entry': > '.gfid/5531bd64-ac50-462b-943e-c0bf1c52f52c/Oracle_VM_VirtualBox_Extension', > 'op': 'MKDIR'}, 2, {'gfid_mismatch': False, 'dst': False}) > [2018-03-12 13:37:14.835911] E [syncdutils(/brick1/mvol1):299:log_raise_exception] > <top>: The above directory failed to sync. Please fix it to proceed further. > > > both gfid's of the directories as shown in the log : > brick1/mvol1/.trashcan/test1/b1 0x5531bd64ac50462b943ec0bf1c52f52c > brick1/mvol1/.trashcan/test1/b1/Oracle_VM_VirtualBox_Extension > 0xc38f75e3194a4d22909450ac8f8756e7 > > the shown directory contains just one file which is stored on gl-node3 and > gl-node4 while node1 and 2 are in geo replication error. > since the filesize limitation of the trashcan is obsolete i'm really > interested to use the trashcan feature but i'm concerned it will interrupt > the geo-replication entirely. > does anybody else have been faced with this situation...any hints, > workarounds... ? > > best regards > Dietmar Putz > > > root at gl-node1:~/tmp# gluster volume info mvol1 > > Volume Name: mvol1 > Type: Distributed-Replicate > Volume ID: a1c74931-568c-4f40-8573-dd344553e557 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp > Bricks: > Brick1: gl-node1-int:/brick1/mvol1 > Brick2: gl-node2-int:/brick1/mvol1 > Brick3: gl-node3-int:/brick1/mvol1 > Brick4: gl-node4-int:/brick1/mvol1 > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > features.trash-max-filesize: 2GB > features.trash: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > root at gl-node1:/myvol-1/test1# gluster volume geo-replication mvol1 > gl-node5-int::mvol1 config > special_sync_mode: partial > gluster_log_file: /var/log/glusterfs/geo-replica > tion/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A% > 2F%2F127.0.0.1%3Amvol1.gluster.log > ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem > change_detector: changelog > use_meta_volume: true > session_owner: a1c74931-568c-4f40-8573-dd344553e557 > state_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ > monitor.status > gluster_params: aux-gfid-mount acl > remote_gsyncd: /nonexistent/gsyncd > working_dir: /var/lib/misc/glusterfsd/mvol1/ssh%3A%2F%2Froot%40192.168. > 178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1 > state_detail_file: /var/lib/glusterd/geo-replicat > ion/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168. > 178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-detail.status > gluster_command_dir: /usr/sbin/ > pid_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ > monitor.pid > georep_session_working_dir: /var/lib/glusterd/geo-replicat > ion/mvol1_gl-node5-int_mvol1/ > ssh_command_tar: ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicat > ion/tar_ssh.pem > master.stime_xattr_name: trusted.glusterfs.a1c74931-568 > c-4f40-8573-dd344553e557.d62bda3a-1396-492a-ad99-7c6238d93c6a.stime > changelog_log_file: /var/log/glusterfs/geo-replica > tion/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A% > 2F%2F127.0.0.1%3Amvol1-changes.log > socketdir: /var/run/gluster > volume_id: a1c74931-568c-4f40-8573-dd344553e557 > ignore_deletes: false > state_socket_unencoded: /var/lib/glusterd/geo-replicat > ion/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168. > 178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.socket > log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot% > 40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.log > access_mount: true > root at gl-node1:/myvol-1/test1# > > -- > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180313/3279173b/attachment.html>
Dietmar Putz
2018-Mar-13 09:13 UTC
[Gluster-users] trashcan on dist. repl. volume with geo-replication
Hi Kotresh, thanks for your repsonse... answers inside... best regards Dietmar Am 13.03.2018 um 06:38 schrieb Kotresh Hiremath Ravishankar:> Hi Dietmar, > > I am trying to understand the problem and have few questions. > > 1. Is trashcan enabled only on master volume?no, trashcan is also enabled on slave. settings are the same as on master but trashcan on slave is complete empty. root at gl-node5:~# gluster volume get mvol1 all | grep -i trash features.trash on features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 2GB features.trash-internal-op off root at gl-node5:~#> 2. Does the 'rm -rf' done on master volume synced to slave ?yes. entire content of ~/test1/b1/* on slave has been removed.> 3. If trashcan is disabled, the issue goes away?after disabling features.trash on master and slave the issue remains...stop and restart of master/slave volume and geo-replication has no effect. root at gl-node1:~# gluster volume geo-replication mvol1 gl-node5-int::mvol1 status MASTER NODE???? MASTER VOL??? MASTER BRICK???? SLAVE USER??? SLAVE????????????????? SLAVE NODE????? STATUS???? CRAWL STATUS?????? LAST_SYNCED ---------------------------------------------------------------------------------------------------------------------------------------------------- gl-node1-int??? mvol1???????? /brick1/mvol1 root????????? gl-node5-int::mvol1??? N/A???????????? Faulty N/A??????????????? N/A gl-node3-int??? mvol1???????? /brick1/mvol1 root????????? gl-node5-int::mvol1??? gl-node7-int??? Passive N/A??????????????? N/A gl-node2-int??? mvol1???????? /brick1/mvol1 root????????? gl-node5-int::mvol1??? N/A???????????? Faulty N/A??????????????? N/A gl-node4-int??? mvol1???????? /brick1/mvol1 root????????? gl-node5-int::mvol1??? gl-node8-int??? Active Changelog Crawl??? 2018-03-12 13:56:28 root at gl-node1:~#> > The geo-rep error just says the it failed to create the directory > "Oracle_VM_VirtualBox_Extension" on slave. > Usually this would be because of gfid mismatch but I don't see that in > your case. So I am little more interested > in present state of the geo-rep. Is it still throwing same errors and > same failure to sync the same directory. If > so does the parent 'test1/b1' exists on slave?it is still throwing the same error as show below. the directory 'test1/b1' is empty as expected and exist on master and slave.> > And doing ls on trashcan should not affect geo-rep. Is there a easy > reproducer for this ?i have made several tests on 3.10.11 and 3.12.6 and i'm pretty sure there was one without activation of the trashcan feature on slave...with same / similiar problems. i will come back with a more comprehensive and reproducible description of that issue...> > > Thanks, > Kotresh HR > > On Mon, Mar 12, 2018 at 10:13 PM, Dietmar Putz <dietmar.putz at 3qsdn.com > <mailto:dietmar.putz at 3qsdn.com>> wrote: > > Hello, > > in regard to > https://bugzilla.redhat.com/show_bug.cgi?id=1434066 > <https://bugzilla.redhat.com/show_bug.cgi?id=1434066> > i have been faced to another issue when using the trashcan feature > on a dist. repl. volume running a geo-replication. (gfs 3.12.6 on > ubuntu 16.04.4) > for e.g. removing an entire directory with subfolders : > tron at gl-node1:/myvol-1/test1/b1$ rm -rf * > > afterwards listing files in the trashcan : > tron at gl-node1:/myvol-1/test1$ ls -la /myvol-1/.trashcan/test1/b1/ > > leads to an outage of the geo-replication. > error on master-01 and master-02 : > > [2018-03-12 13:37:14.827204] I [master(/brick1/mvol1):1385:crawl] > _GMaster: slave's time stime=(1520861818, 0) > [2018-03-12 13:37:14.835535] E > [master(/brick1/mvol1):784:log_failures] _GMaster: ENTRY FAILED??? > data=({'uid': 0, 'gfid': 'c38f75e3-194a-4d22-9094-50ac8f8756e7', > 'gid': 0, 'mode': 16877, 'entry': > '.gfid/5531bd64-ac50-462b-943e-c0bf1c52f52c/Oracle_VM_VirtualBox_Extension', > 'op': 'MKDIR'}, 2, {'gfid_mismatch': False, 'dst': False}) > [2018-03-12 13:37:14.835911] E > [syncdutils(/brick1/mvol1):299:log_raise_exception] <top>: The > above directory failed to sync. Please fix it to proceed further. > > > both gfid's of the directories as shown in the log : > brick1/mvol1/.trashcan/test1/b1 0x5531bd64ac50462b943ec0bf1c52f52c > brick1/mvol1/.trashcan/test1/b1/Oracle_VM_VirtualBox_Extension > 0xc38f75e3194a4d22909450ac8f8756e7 > > the shown directory contains just one file which is stored on > gl-node3 and gl-node4 while node1 and 2 are in geo replication error. > since the filesize limitation of the trashcan is obsolete i'm > really interested to use the trashcan feature but i'm concerned it > will interrupt the geo-replication entirely. > does anybody else have been faced with this situation...any hints, > workarounds... ? > > best regards > Dietmar Putz > > > root at gl-node1:~/tmp# gluster volume info mvol1 > > Volume Name: mvol1 > Type: Distributed-Replicate > Volume ID: a1c74931-568c-4f40-8573-dd344553e557 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp > Bricks: > Brick1: gl-node1-int:/brick1/mvol1 > Brick2: gl-node2-int:/brick1/mvol1 > Brick3: gl-node3-int:/brick1/mvol1 > Brick4: gl-node4-int:/brick1/mvol1 > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > features.trash-max-filesize: 2GB > features.trash: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > root at gl-node1:/myvol-1/test1# gluster volume geo-replication mvol1 > gl-node5-int::mvol1 config > special_sync_mode: partial > gluster_log_file: > /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.gluster.log > ssh_command: ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem > change_detector: changelog > use_meta_volume: true > session_owner: a1c74931-568c-4f40-8573-dd344553e557 > state_file: > /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/monitor.status > gluster_params: aux-gfid-mount acl > remote_gsyncd: /nonexistent/gsyncd > working_dir: > /var/lib/misc/glusterfsd/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1 > state_detail_file: > /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-detail.status > gluster_command_dir: /usr/sbin/ > pid_file: > /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/monitor.pid > georep_session_working_dir: > /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ > ssh_command_tar: ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/tar_ssh.pem > master.stime_xattr_name: > trusted.glusterfs.a1c74931-568c-4f40-8573-dd344553e557.d62bda3a-1396-492a-ad99-7c6238d93c6a.stime > changelog_log_file: > /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-changes.log > socketdir: /var/run/gluster > volume_id: a1c74931-568c-4f40-8573-dd344553e557 > ignore_deletes: false > state_socket_unencoded: > /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.socket > log_file: > /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.log > access_mount: true > root at gl-node1:/myvol-1/test1# > > -- > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > -- > Thanks and Regards, > Kotresh H R-- Dietmar Putz 3Q GmbH Kurf?rstendamm 102 D-10711 Berlin Mobile: +49 171 / 90 160 39 Mail: dietmar.putz at 3qsdn.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180313/a8efa6a8/attachment.html>