Adam Nielsen
2016-Dec-18 14:13 UTC
Feature request: Closing destination files in a separate thread
Hi all, I'm wondering whether it is feasible to have an option that will make rsync spawn a separate thread to close files it has created, to avoid the main process blocking while the destination files are flushed during the close operation? The reason I ask is that it is currently very slow to use rsync on a fast locally-mounted network filesystem, as you see the following behaviour: 1. rsync reads the source file, network remains idle 2. Kernel buffers start to fill up 3. Some seconds later, kernel starts writing data to destination filesystem 4. rsync gets to the end of the file and closes the target file 5. rsync hangs for 10+ seconds while the target file's data gets flushed over the network. No data is being read from the source. 6. Back to step 1, rsync reads the next file from the source, while the network is idle as the kernel buffers are now empty. This gives the impression that the copy process alternates between reading then writing, instead of both reading and writing at the same time. It results in a much slower operation because in step 1 above, the network is idle, then in step 5, the local disk is idle. I am thinking that if the final close operation was performed in a separate thread, it would allow the main rsync operation to continue and start copying the next file while the previous one was still being flushed. This would mean both the source (local disk) and the target (network) would be fully utilized instead of them sitting idle for a large amount of the operation. Is something like this feasible? Many thanks, Adam. P.S. I am using a CIFS mount for this, and when I mount it with cache=none then the alternating read-then-write behaviour goes away, but the transfer rate drops by almost 30% so it ends up being slower overall.
Kevin Korb
2016-Dec-18 18:03 UTC
Feature request: Closing destination files in a separate thread
Can you let rsync do the networking? If rsync isn't doing the networking then it isn't much more capable than cp yet it is significantly slower than cp. On 12/18/2016 09:13 AM, Adam Nielsen wrote:> Hi all, > > I'm wondering whether it is feasible to have an option that will make > rsync spawn a separate thread to close files it has created, to avoid > the main process blocking while the destination files are flushed during > the close operation? > > The reason I ask is that it is currently very slow to use rsync on a > fast locally-mounted network filesystem, as you see the following > behaviour: > > 1. rsync reads the source file, network remains idle > 2. Kernel buffers start to fill up > 3. Some seconds later, kernel starts writing data to destination > filesystem > 4. rsync gets to the end of the file and closes the target file > 5. rsync hangs for 10+ seconds while the target file's data gets > flushed over the network. No data is being read from the source. > 6. Back to step 1, rsync reads the next file from the source, while > the network is idle as the kernel buffers are now empty. > > This gives the impression that the copy process alternates between > reading then writing, instead of both reading and writing at the same > time. It results in a much slower operation because in step 1 above, > the network is idle, then in step 5, the local disk is idle. > > I am thinking that if the final close operation was performed in a > separate thread, it would allow the main rsync operation to continue > and start copying the next file while the previous one was still being > flushed. > > This would mean both the source (local disk) and the target (network) > would be fully utilized instead of them sitting idle for a large amount > of the operation. > > Is something like this feasible? > > Many thanks, > Adam. > > P.S. I am using a CIFS mount for this, and when I mount it with > cache=none then the alternating read-then-write behaviour goes away, > but the transfer rate drops by almost 30% so it ends up being slower > overall. > > >-- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 191 bytes Desc: OpenPGP digital signature URL: <http://lists.samba.org/pipermail/rsync/attachments/20161218/ba7b33ba/signature.sig>
Christoph Biedl
2016-Dec-23 11:38 UTC
Feature request: Closing destination files in a separate thread
Adam Nielsen wrote...> I'm wondering whether it is feasible to have an option that will make > rsync spawn a separate thread to close files it has created, to avoid > the main process blocking while the destination files are flushed during > the close operation?While your scenario resembles a problem I've been experiencing for years, I'm not sure whether such a change would help.> The reason I ask is that it is currently very slow to use rsync on a > fast locally-mounted network filesystem, as you see the following > behaviour:If I understand correctly, you transfer from a local file system to a network (CIFS) one. Does this happen in a local-local scenario as well? And, is this related to to the size of the files? My woes were copying many rather small (100k) files unto a USB flash drive. Once the buffers are filled up, write rate drops to somewhat one percent of the raw value (like 200 kbyte/sec on a USB 2.0 drive). My workaround was to switch the file system from ext4 without journal to f2fs, assuming ext4 puts inode commits into an isolated transaction (no-barrier didn't help). Might be a different story but it sounded somewhat familiar. Christoph
Maybe Matching Threads
- Rsync to just upload new/updated files without comparing source/destination
- Can rsync assume that the destination directory is empty ?
- rsync prevent destination only new folders but need new files
- How to get the estimated binary diff size without really changing the destination folder?
- Feature Request: don't sync if it would result in more than NUM deletions.