Dietmar Putz
2018-Jan-24 16:59 UTC
[Gluster-users] geo-replication command rsync returned with 3
Hi all, i have made some tests on the latest Ubuntu 16.04.3 server image. Upgrades were disabled... the configuration was always the same...a distributed replicated volume on 4 VM's with geo-replication to a dist. repl .volume on 4 VM's. i started with 3.7.20, upgrade to 3.8.15, to 3.10.9 to 3.12.5. After each upgrade i have tested the geo-replication which worked well anytime. then i have made an update / upgrade on the first master node. directly after upgrade the below shown error appeared on that node. after upgrade on the second master node the error appeared there also... geo replication is faulty. this error affects gfs 3.7.20, 3.8.15, 3.10.9 and 3.12.5 on Ubuntu 16.04.3 in one test i have updated rsync from 3.1.1 to 3.1.2 but with no effect. does anyone else experienced this behavior...any idea ? best regards Dietmar gfs 3.12.5 geo-rep log on master : [2018-01-24 15:50:35.347959] I [master(/brick1/mvol1):1385:crawl] _GMaster: slave's time stime=(1516808792, 0) [2018-01-24 15:50:35.604094] I [master(/brick1/mvol1):1863:syncjob] Syncer: Sync Time Taken duration=0.0294??? num_files=1??? job=2??? return_code=3 [2018-01-24 15:50:35.605490] E [resource(/brick1/mvol1):210:errlog] Popen: command returned error??? cmd=rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --existing --xattrs --acls --ignore-missing-args . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-MZwEp2/cbad1c5f88978ecd713bdb1478fbabbe.sock --compress root at gl-node5-int:/proc/2013/cwd??? error=3 [2018-01-24 15:50:35.628978] I [syncdutils(/brick1/mvol1):271:finalize] <top>: exiting. after this upgrade one server fails : Start-Date: 2018-01-18? 04:33:52 Commandline: /usr/bin/unattended-upgrade Upgrade: libdns-export162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10), libisccfg140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10), bind9-host:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10), dnsutils:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10), libc6:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), libisc160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10), locales:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), libisc-export160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10), libc-bin:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), liblwres141:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10), libdns162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10), multiarch-support:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), libisccc140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10), libbind9-140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10) End-Date: 2018-01-18? 04:34:32 strace rsync : 30743 23:34:47 newfstatat(3, "6737", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 30743 23:34:47 newfstatat(3, "6741", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 30743 23:34:47 getdents(3, /* 0 entries */, 131072) = 0 30743 23:34:47 munmap(0x7fa4feae7000, 135168) = 0 30743 23:34:47 close(3)???????????????? = 0 30743 23:34:47 write(2, "rsync: getcwd(): No such file or directory (2)", 46) = 46 30743 23:34:47 write(2, "\n", 1)??????? = 1 30743 23:34:47 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7fa4fdf404b0}, NULL, 8) = 0 30743 23:34:47 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7fa4fdf404b0}, NULL, 8) = 0 30743 23:34:47 write(2, "rsync error: errors selecting input/output files, dirs (code 3) at util.c(1056) [Receiver=3.1.1]", 96) = 96 30743 23:34:47 write(2, "\n", 1)??????? = 1 30743 23:34:47 exit_group(3)??????????? = ? 30743 23:34:47 +++ exited with 3 +++ Am 19.01.2018 um 17:27 schrieb Joe Julian:> ubuntu 16.04-- Dietmar Putz 3Q GmbH Kurf?rstendamm 102 D-10711 Berlin Mobile: +49 171 / 90 160 39 Mail: dietmar.putz at 3qsdn.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180124/76264a75/attachment.html>
Kotresh Hiremath Ravishankar
2018-Jan-25 03:14 UTC
[Gluster-users] geo-replication command rsync returned with 3
It is clear that rsync is failing. Are the rsync versions on all masters and slave nodes same? I have seen that has caused problems sometimes. -Kotresh HR On Wed, Jan 24, 2018 at 10:29 PM, Dietmar Putz <dietmar.putz at 3qsdn.com> wrote:> Hi all, > i have made some tests on the latest Ubuntu 16.04.3 server image. Upgrades > were disabled... > the configuration was always the same...a distributed replicated volume on > 4 VM's with geo-replication to a dist. repl .volume on 4 VM's. > i started with 3.7.20, upgrade to 3.8.15, to 3.10.9 to 3.12.5. After each > upgrade i have tested the geo-replication which worked well anytime. > then i have made an update / upgrade on the first master node. directly > after upgrade the below shown error appeared on that node. > after upgrade on the second master node the error appeared there also... > geo replication is faulty. > > this error affects gfs 3.7.20, 3.8.15, 3.10.9 and 3.12.5 on Ubuntu 16.04.3 > in one test i have updated rsync from 3.1.1 to 3.1.2 but with no effect. > > does anyone else experienced this behavior...any idea ? > > best regards > Dietmar > > > gfs 3.12.5 geo-rep log on master : > > [2018-01-24 15:50:35.347959] I [master(/brick1/mvol1):1385:crawl] > _GMaster: slave's time stime=(1516808792, 0) > [2018-01-24 15:50:35.604094] I [master(/brick1/mvol1):1863:syncjob] > Syncer: Sync Time Taken duration=0.0294 num_files=1 job=2 > return_code=3 > [2018-01-24 15:50:35.605490] E [resource(/brick1/mvol1):210:errlog] > Popen: command returned error cmd=rsync -aR0 --inplace --files-from=- > --super --stats --numeric-ids --no-implied-dirs --existing --xattrs --acls > --ignore-missing-args . -e ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem > -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-MZwEp2/ > cbad1c5f88978ecd713bdb1478fbabbe.sock --compress root at gl-node5-int > :/proc/2013/cwd error=3 > [2018-01-24 15:50:35.628978] I [syncdutils(/brick1/mvol1):271:finalize] > <top>: exiting. > > > > after this upgrade one server fails : > Start-Date: 2018-01-18 04:33:52 > Commandline: /usr/bin/unattended-upgrade > Upgrade: > libdns-export162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > libisccfg140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > bind9-host:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > dnsutils:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > libc6:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), > libisc160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > locales:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), > libisc-export160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > libc-bin:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), > liblwres141:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > libdns162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > multiarch-support:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), > libisccc140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > libbind9-140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10) > End-Date: 2018-01-18 04:34:32 > > > > strace rsync : > > 30743 23:34:47 newfstatat(3, "6737", {st_mode=S_IFDIR|0755, st_size=4096, > ...}, AT_SYMLINK_NOFOLLOW) = 0 > 30743 23:34:47 newfstatat(3, "6741", {st_mode=S_IFDIR|0755, st_size=4096, > ...}, AT_SYMLINK_NOFOLLOW) = 0 > 30743 23:34:47 getdents(3, /* 0 entries */, 131072) = 0 > 30743 23:34:47 munmap(0x7fa4feae7000, 135168) = 0 > 30743 23:34:47 close(3) = 0 > 30743 23:34:47 write(2, "rsync: getcwd(): No such file or directory (2)", > 46) = 46 > 30743 23:34:47 write(2, "\n", 1) = 1 > 30743 23:34:47 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, > 0x7fa4fdf404b0}, NULL, 8) = 0 > 30743 23:34:47 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, > 0x7fa4fdf404b0}, NULL, 8) = 0 > 30743 23:34:47 write(2, "rsync error: errors selecting input/output files, > dirs (code 3) at util.c(1056) [Receiver=3.1.1]", 96) = 96 > 30743 23:34:47 write(2, "\n", 1) = 1 > 30743 23:34:47 exit_group(3) = ? > 30743 23:34:47 +++ exited with 3 +++ > > > > > Am 19.01.2018 um 17:27 schrieb Joe Julian: > > ubuntu 16.04 > > > -- > Dietmar Putz > 3Q GmbH > Kurf?rstendamm 102 > D-10711 Berlin > > Mobile: +49 171 / 90 160 39 > Mail: dietmar.putz at 3qsdn.com > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180125/34a63556/attachment.html>
Dietmar Putz
2018-Jan-25 13:06 UTC
[Gluster-users] geo-replication command rsync returned with 3
Hi Kotresh, thanks for your response... i have made further tests based on ubuntu 16.04.3 (latest upgrades) and gfs 3.12.5 with following rsync version : 1. ii? rsync????????????????????????????? 3.1.1-3ubuntu1 2. ii? rsync????????????????????????????? 3.1.1-3ubuntu1.2 3. ii? rsync????????????????????????????? 3.1.2-2ubuntu0.1 in each test all nodes had the same rsync version installed. all tests failed with the below shown error. then i started a test with the same setup. this time upgrades for ubuntu 16.04.3 were disabled. again i have tested the above shown rsync versions on all nodes. the geo replication works fine in any case. then i have installed? 3.1.1-3ubuntu1.2 on each node as it is installed with latest ubuntu upgrade. afterwards i have just upgraded all slave nodes with the latest ubuntu upgrades, reboot...the geo-replication still works fine. after upgrade and reboot of the master nodes with the latest ubuntu upgrades the below shown error appeared again... geo rep is broken. i believe one of the below shown packages caused the error because directly after upgrade of these packages the error appeared on the corresponding master node.... anyone else who has experienced this issue...? any help would be appreciated. best regards Dietmar Am 25.01.2018 um 04:14 schrieb Kotresh Hiremath Ravishankar:> It is clear that rsync is failing. Are the rsync versions on all > masters and slave nodes same? > I have seen that has caused problems sometimes. > > -Kotresh HR > > On Wed, Jan 24, 2018 at 10:29 PM, Dietmar Putz <dietmar.putz at 3qsdn.com > <mailto:dietmar.putz at 3qsdn.com>> wrote: > > Hi all, > > i have made some tests on the latest Ubuntu 16.04.3 server image. > Upgrades were disabled... > the configuration was always the same...a distributed replicated > volume on 4 VM's with geo-replication to a dist. repl .volume on 4 > VM's. > i started with 3.7.20, upgrade to 3.8.15, to 3.10.9 to 3.12.5. > After each upgrade i have tested the geo-replication which worked > well anytime. > then i have made an update / upgrade on the first master node. > directly after upgrade the below shown error appeared on that node. > after upgrade on the second master node the error appeared there > also... geo replication is faulty. > > this error affects gfs 3.7.20, 3.8.15, 3.10.9 and 3.12.5 on Ubuntu > 16.04.3 > in one test i have updated rsync from 3.1.1 to 3.1.2 but with no > effect. > > does anyone else experienced this behavior...any idea ? > > best regards > Dietmar > > > gfs 3.12.5 geo-rep log on master : > > [2018-01-24 15:50:35.347959] I [master(/brick1/mvol1):1385:crawl] > _GMaster: slave's time??? stime=(1516808792, 0) > [2018-01-24 15:50:35.604094] I > [master(/brick1/mvol1):1863:syncjob] Syncer: Sync Time Taken??? > duration=0.0294??? num_files=1 job=2??? return_code=3 > [2018-01-24 15:50:35.605490] E > [resource(/brick1/mvol1):210:errlog] Popen: command returned > error??? cmd=rsync -aR0 --inplace --files-from=- --super --stats > --numeric-ids --no-implied-dirs --existing --xattrs --acls > --ignore-missing-args . -e ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem -p 22 > -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-MZwEp2/cbad1c5f88978ecd713bdb1478fbabbe.sock > --compress root at gl-node5-int:/proc/2013/cwd error=3 > [2018-01-24 15:50:35.628978] I > [syncdutils(/brick1/mvol1):271:finalize] <top>: exiting. > > > > after this upgrade one server fails : > Start-Date: 2018-01-18? 04:33:52 > Commandline: /usr/bin/unattended-upgrade > Upgrade: > libdns-export162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > libisccfg140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > bind9-host:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > dnsutils:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > libc6:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), > libisc160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > locales:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), > libisc-export160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > libc-bin:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), > liblwres141:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > libdns162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > multiarch-support:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10), > libisccc140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10), > libbind9-140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, > 1:9.10.3.dfsg.P4-8ubuntu1.10) > End-Date: 2018-01-18? 04:34:32 > > > > strace rsync : > > 30743 23:34:47 newfstatat(3, "6737", {st_mode=S_IFDIR|0755, > st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 > 30743 23:34:47 newfstatat(3, "6741", {st_mode=S_IFDIR|0755, > st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 > 30743 23:34:47 getdents(3, /* 0 entries */, 131072) = 0 > 30743 23:34:47 munmap(0x7fa4feae7000, 135168) = 0 > 30743 23:34:47 close(3)???????????????? = 0 > 30743 23:34:47 write(2, "rsync: getcwd(): No such file or > directory (2)", 46) = 46 > 30743 23:34:47 write(2, "\n", 1)??????? = 1 > 30743 23:34:47 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, > 0x7fa4fdf404b0}, NULL, 8) = 0 > 30743 23:34:47 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, > 0x7fa4fdf404b0}, NULL, 8) = 0 > 30743 23:34:47 write(2, "rsync error: errors selecting > input/output files, dirs (code 3) at util.c(1056) > [Receiver=3.1.1]", 96) = 96 > 30743 23:34:47 write(2, "\n", 1)??????? = 1 > 30743 23:34:47 exit_group(3)??????????? = ? > 30743 23:34:47 +++ exited with 3 +++ > > > > > Am 19.01.2018 um 17:27 schrieb Joe Julian: >> ubuntu 16.04 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > -- > Thanks and Regards, > Kotresh H R-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180125/13d8cac4/attachment.html>
Florian Weimer
2018-Feb-05 12:33 UTC
[Gluster-users] geo-replication command rsync returned with 3
(resending, sorry for duplicates) On 01/24/2018 05:59 PM, Dietmar Putz wrote:> strace rsync : > > 30743 23:34:47 newfstatat(3, "6737", {st_mode=S_IFDIR|0755, > st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 > 30743 23:34:47 newfstatat(3, "6741", {st_mode=S_IFDIR|0755, > st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 > 30743 23:34:47 getdents(3, /* 0 entries */, 131072) = 0 > 30743 23:34:47 munmap(0x7fa4feae7000, 135168) = 0 > 30743 23:34:47 close(3)???????????????? = 0 > 30743 23:34:47 write(2, "rsync: getcwd(): No such file or directory > (2)", 46) = 46 > 30743 23:34:47 write(2, "\n", 1)??????? = 1 > 30743 23:34:47 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, > 0x7fa4fdf404b0}, NULL, 8) = 0 > 30743 23:34:47 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, > 0x7fa4fdf404b0}, NULL, 8) = 0 > 30743 23:34:47 write(2, "rsync error: errors selecting input/output > files, dirs (code 3) at util.c(1056) [Receiver=3.1.1]", 96) = 96 > 30743 23:34:47 write(2, "\n", 1)??????? = 1 > 30743 23:34:47 exit_group(3)??????????? = ? > 30743 23:34:47 +++ exited with 3 +++Do you have strace output going further back, at least to the proceeding getcwd call? It would be interesting to see which path the kernel reports, and if it starts with "(unreachable)". Thanks, Florian
Florian Weimer
2018-Feb-05 19:07 UTC
[Gluster-users] geo-replication command rsync returned with 3
On 02/05/2018 01:33 PM, Florian Weimer wrote:> Do you have strace output going further back, at least to the proceeding > getcwd call?? It would be interesting to see which path the kernel > reports, and if it starts with "(unreachable)".I got the strace output now, but it very difficult to read (chdir in a multi-threaded process ?). My current inclination is to blame rsync because it does an unconditional getcwd during startup, which now fails if the current directory is unreachable. Further references: https://sourceware.org/ml/libc-alpha/2018-02/msg00152.html https://bugzilla.redhat.com/show_bug.cgi?id=1542180 Andreas Schwab agrees that rsync is buggy: https://sourceware.org/ml/libc-alpha/2018-02/msg00153.html Thanks, Florian