-------- Original Message -------- Subject: rsync seem to be broken on sparc64 Date: 2016-06-27 23:43 From: alexmcwhirter at triadic.us To: debian-sparc at lists.debian.org I posted about this in the kernel lists a few months ago to no avail. I see it on gentoo as well with any kernel newer than 3.18. I came across this when using lxc on sparc64. The debian template uses rsync to move the cache's rootfs to the actual container directory. I've since modified the template to use "cp -a" instead of rsync, which works. However this could be an issue for quite a lot of people that use rsync as a backup solution. It really needs to be addressed if we want sparc64 to be a release platform. Here's the gist of it from back then... On Sun, Feb 21, 2016 at 01:52:55PM -0500, Alex McWhirter wrote:> On 02/14/2016 07:02 PM, Alex McWhirter wrote: > > I having a strange issue where using any 4.X kernel causes rsync to > > appear to die on a select syscall. Not sure why, maybe it's getting a > > wrong file descriptor or something. Unfortunately this starts pushing > > outside of my knowledge level of linux so bear with me. This is on a Sun > > V215 but i have also tested it on a Sun Blade 150 and a Sun Ultra 45 > > with the same results. These are all sun4u boxes of course, i haven't > > tried any sun4v boxes. I''l try to spin up a T5120 this week and find > > out if it's also an issue on sun4v. > > > > Here's what I've tested. > > > > 3.14.58 "gentoo" - Works > > 3.18.26 "vanilla" - Works > > 4.1.15 "gentoo" - Dead > > 4.1.17 "vanilla" - Dead > > 4.4.1 "vanilla" - Dead > > > > I don't mind hacking away at kernel sources if anyone can point me in > > the right direction. It's also worth noting that this only happens when > > the folder i am attempting to rsync is significantly large in regards to > > the amount of sub-folders and files. The Gentoo portage tree in particular. > > > > Attached is the strace output of a failing rsync job. > > > > > > I've traced this down a bit further. > > Kernel 3.18.26 is working but 3.19.0 is not. Git bisect traced it down > to this commit. > > e5a4b0bb803b39a36478451eae53a880d2663d5b is the first bad commit > commit e5a4b0bb803b39a36478451eae53a880d2663d5bhere is the gist of that commit... https://lkml.org/lkml/2014/12/5/25 here is the output of rsync when the error occurs. root at Magi-01:~# rsync -a /export/test/* /export/test2 rsync: [sender] write error: Broken pipe (32) rsync error: error in socket IO (code 10) at io.c(820) [sender=3.1.1] root at Magi-01:~# here is the output of rsync when executed via gdb root at Magi-01:~# gdb /usr/bin/rsync GNU gdb (Debian 7.11.1-2) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "sparc64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... /root/.gdbinit:1: Error in sourced command file: No executable file specified. Use the "file" or "exec-file" command. Reading symbols from /usr/bin/rsync...(no debugging symbols found)...done. (gdb) set args -a /export/test/* /export/test2 (gdb) run Starting program: /usr/bin/rsync -a /export/test/* /export/test2 Program received signal SIGPIPE, Broken pipe. 0xfffff80100528fb4 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:84 84 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) T thought i would bring this up here as well. Does anyone have any input on this?
I know almost nothing about modern SPARC64 systems especially when they are running linux. But, can you try this command line and see if it gives more information before it blows up: rsync -vvai /export/test/ /export/test2/ On 06/28/2016 05:39 PM, alexmcwhirter at triadic.us wrote:> -------- Original Message -------- > Subject: rsync seem to be broken on sparc64 > Date: 2016-06-27 23:43 > From: alexmcwhirter at triadic.us > To: debian-sparc at lists.debian.org > > I posted about this in the kernel lists a few months ago to no avail. I > see it on gentoo as well with any kernel newer than 3.18. I came across > this when using lxc on sparc64. The debian template uses rsync to move > the cache's rootfs to the actual container directory. > > I've since modified the template to use "cp -a" instead of rsync, which > works. However this could be an issue for quite a lot of people that use > rsync as a backup solution. It really needs to be addressed if we want > sparc64 to be a release platform. > > Here's the gist of it from back then... > > On Sun, Feb 21, 2016 at 01:52:55PM -0500, Alex McWhirter wrote: >> On 02/14/2016 07:02 PM, Alex McWhirter wrote: >> > I having a strange issue where using any 4.X kernel causes rsync to >> > appear to die on a select syscall. Not sure why, maybe it's getting a >> > wrong file descriptor or something. Unfortunately this starts pushing >> > outside of my knowledge level of linux so bear with me. This is on a >> Sun >> > V215 but i have also tested it on a Sun Blade 150 and a Sun Ultra 45 >> > with the same results. These are all sun4u boxes of course, i haven't >> > tried any sun4v boxes. I''l try to spin up a T5120 this week and find >> > out if it's also an issue on sun4v. >> > >> > Here's what I've tested. >> > >> > 3.14.58 "gentoo" - Works >> > 3.18.26 "vanilla" - Works >> > 4.1.15 "gentoo" - Dead >> > 4.1.17 "vanilla" - Dead >> > 4.4.1 "vanilla" - Dead >> > >> > I don't mind hacking away at kernel sources if anyone can point me in >> > the right direction. It's also worth noting that this only happens when >> > the folder i am attempting to rsync is significantly large in >> regards to >> > the amount of sub-folders and files. The Gentoo portage tree in >> particular. >> > >> > Attached is the strace output of a failing rsync job. >> > >> > >> >> I've traced this down a bit further. >> >> Kernel 3.18.26 is working but 3.19.0 is not. Git bisect traced it down >> to this commit. >> >> e5a4b0bb803b39a36478451eae53a880d2663d5b is the first bad commit >> commit e5a4b0bb803b39a36478451eae53a880d2663d5b > > here is the gist of that commit... > > https://lkml.org/lkml/2014/12/5/25 > > here is the output of rsync when the error occurs. > > root at Magi-01:~# rsync -a /export/test/* /export/test2 > rsync: [sender] write error: Broken pipe (32) > rsync error: error in socket IO (code 10) at io.c(820) [sender=3.1.1] > root at Magi-01:~# > > here is the output of rsync when executed via gdb > > root at Magi-01:~# gdb /usr/bin/rsync > GNU gdb (Debian 7.11.1-2) 7.11.1 > Copyright (C) 2016 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "sparc64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > For help, type "help". > Type "apropos word" to search for commands related to "word"... > /root/.gdbinit:1: Error in sourced command file: > No executable file specified. > Use the "file" or "exec-file" command. > Reading symbols from /usr/bin/rsync...(no debugging symbols found)...done. > (gdb) set args -a /export/test/* /export/test2 > (gdb) run > Starting program: /usr/bin/rsync -a /export/test/* /export/test2 > > Program received signal SIGPIPE, Broken pipe. > 0xfffff80100528fb4 in __write_nocancel () at > ../sysdeps/unix/syscall-template.S:84 > 84 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) > (gdb) > > > > T thought i would bring this up here as well. Does anyone have any input > on this? >-- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: <http://lists.samba.org/pipermail/rsync/attachments/20160628/8a954944/signature.sig>
On 2016-06-28 18:10, Kevin Korb wrote:> I know almost nothing about modern SPARC64 systems especially when they > are running linux. But, can you try this command line and see if it > gives more information before it blows up: > rsync -vvai /export/test/ /export/test2/I'm not sure how helpful this will be, but it certainly doesn't get very far. root at Magi-01:/var/cache/lxc/debian/rootfs-sid-sparc64# rsync -vvai ./* /export/test/ sending incremental file list delta-transmission disabled for local transfer or --whole-file cd+++++++++ bin/> f+++++++++ bin/bashrsync: [sender] write error: Broken pipe (32) rsync error: error in socket IO (code 10) at io.c(820) [sender=3.1.1] root at Magi-01:/var/cache/lxc/debian/rootfs-sid-sparc64# here's the relevant strace output of the error. getdents(3, /* 0 entries */, 32768) = 0 close(3) = 0 write(1, ".d..t...... bin/\n", 17.d..t...... bin/ ) = 17 open("bin/bash", O_RDONLY) = 3 fstat64(3, {st_mode=01, st_size=0, ...}) = 0 write(1, ">f+++++++++ bin/bash\n", 21>f+++++++++ bin/bash ) = 21 mmap(NULL, 270336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffff8010289e000 read(3, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\2\0+\0\0\0\1\0\0\0\0\0\21\373P"..., 262144) = 262144 select(6, [5], [4], [5], {60, 0}) = 1 (out [4], left {59, 999987}) write(4, "\223\213\0\7\377\0368\r\7E/Eterm\0\333\10W\233\27g\244\201\0\0008\24\6-c"..., 35735) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=10421, si_uid=0} --- write(2, "rsync: [sender] write error: Bro"..., 45rsync: [sender] write error: Broken pipe (32)) = 45 write(2, "\n", 1 ) = 1 rt_sigaction(SIGUSR1, {SIG_IGN, [], 0}, NULL, 0xfffff801004839b8, 8) = 0 rt_sigaction(SIGUSR2, {SIG_IGN, [], 0}, NULL, 0xfffff801004839b8, 8) = 0 waitpid(10422, 0x7feffd71954, WNOHANG) = 0 getpid() = 10421 kill(10422, SIGUSR1) = 0 write(2, "rsync error: error in socket IO "..., 69rsync error: error in socket IO (code 10) at io.c(820) [sender=3.1.1]) = 69 write(2, "\n", 1 ) = 1 exit_group(10) = ? +++ exited with 10 +++