samba-bugs at samba.org
2020-Mar-05 22:31 UTC
[Bug 14315] New: rsync hangs when many errors
https://bugzilla.samba.org/show_bug.cgi?id=14315 Bug ID: 14315 Summary: rsync hangs when many errors Product: rsync Version: 3.1.3 Hardware: All OS: All Status: NEW Severity: normal Priority: P5 Component: core Assignee: wayne at opencoder.net Reporter: mvitale at sinenomine.net QA Contact: rsync-qa at samba.org Created attachment 15843 --> https://bugzilla.samba.org/attachment.cgi?id=15843&action=edit test program to aid in reproducing the issue When performing a local rsync of a large directory (over 10000 files), it will hang if a large number of errors occur on the target (destination) directory. I am a support engineer for OpenAFS (openafs.org), and this issue was originally reported by a customer as a possible OpenAFS problem. This customer observed a hang when rsyncing a large directory into AFS. I was able to reproduce the problem and demonstrate that the hang is triggered when chown commands, issued by rsync to restore the group of the destination files, failed due to a security feature of AFS that prohibits the owner of a file from changing group ownership. The large number of resultant errors caused the three rsync processes to stall. With the help of a colleague, we were able to devise a way to reproduce this hang without requiring an AFS filesystem. In order to recreate the rsync hang, we need a way to get a large number of errors while performing the rsync from a normal ext4 filesystem. In our procedure, we simulate these errors by using a small Linux seccomp program to prohibit chgrp/chown syscalls. 1. Login to a linux account that belongs to at least 2 groups. $ id uid=1000(mvitale) gid=1000(mvitale) groups=1000(mvitale),10(wheel) 2. Build a program to simulate chown/chgrp errors: $ sudo yum install libseccomp libseccomp-devel $ cc -lseccmp seccomp-chown.c -o sec-kill-chown The source code for seccomp-chown.c is attached to this ticket. 3. Create a large source directory with over 10000 files. $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git These files will all have the group ownership of the user's current group. Any sufficiently large directory should work; it doesn't have to be a git repo. 4. Switch to the alternate group (starts a new shell) $ newgrp wheel $ id uid=1000(mvitale) gid=10(wheel) groups=10(wheel),1000(mvitale) 5. Enable the error generator (this also starts a new shell) $ ./sec-kill-chown Running shell. chown() and friends are now unavailable. 6. Create a target directory and run rsync to duplicate the hang. $ mkdir target $ cd target $ rsync -av --delete --log-file=/tmp/rlog.$$ /home/mvitale/linux ./ This should hang after a few seconds. 7. Exit the two shells (seccomp and newgrp) $ exit $ exit I was able to perform a git bisect to isolate the commit that introduced this hang: d8587b4 Change the msg pipe to use a real multiplexed IO mode for the data that goes from the receiver to the generator. The following releases show the problem: master, 3.1.3, 3.1.2, 3.1.0 Release 3.0.9 and older do not exhibit the problem. Each of the following workarounds were successful for my customer and in my testing: - use an older version of rsync (3.0.9 or older) - specify rsync option --msgs2stderr - perform the rsync under a userid with the same group as the source files Thanks for your consideration, and please let me know if there's anything else I can provide to help. Regards, -- Mark Vitale mvitale at sinenomine.net -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=14315 --- Comment #1 from Mark Vitale <mvitale at sinenomine.net> --- Sorry, I gave the wrong commit in my report. I bisected this hang to: 1a2704512a6f6c9bf267042ff8beb50a24e1d057 is the first bad commit commit 1a2704512a6f6c9bf267042ff8beb50a24e1d057 Author: Wayne Davison <wayned at samba.org> Date: Wed Dec 21 08:30:07 2011 -0800 Improve the handling of verbose/debug messages -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=14315 Wayne Davison <wayne at opencoder.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #2 from Wayne Davison <wayne at opencoder.net> --- Should be fixed in the latest git version. -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=14315 --- Comment #3 from Mark Vitale <mvitale at sinenomine.net> --- Thank you very much! -- You are receiving this mail because: You are the QA Contact for the bug.
Reasonably Related Threads
- [Bug 11166] New: running with -vvv causes a hang
- [Bug 2142] New: openssh sandboxing using libseccomp
- [LLVMdev] 2.5 Pre-release1 available for testing
- [LLVMdev] Clang build problem
- Bug#785132: Bug#785132: No screen refresh on Windows 8.1 with xen-hypervisor-4.5-amd64