dbonde+forum+rsync.lists.samba.org at gmail.com wrote:> It is exactly as I wrote. On a network volume (A) a "sparse disk image bundle" (B), i.e., a type of disk image used in OS X, is stored. B is then mounted locally (i.e., local to where rsync is run) on a computer (C) where it appears as one of many volumes. > > In other words, B is stored on A. A is then mounted (using AFP) on C. C then mounts B (=opens a file on a network volume, but instead of opening e.g., a spreadsheet in Excel, opening B shows a new volume on the desktop of C) stored on A.> The computer where it is mounted just sees a mounted volume - it can't distinguish between a disk image stored remotely or stored on the computers internal hard drive.I wouldn't count on that !> I assume you are familiar with the idea of disk images?I think most are familiar with disk images - but not so many with the specific implementations used by OS X. OS X has the concept of a "bundle". To the user this appears as a single file with it's own name and icon. Internally it's a folder tree with a number of files/folders. As a quick test, I've just created a 100M sparse image, here's the contents before I've added any files :> $ ls -lRh a.sparsebundle/ > total 16 > -rw-r--r-- 1 simon staff 496B 25 Jan 14:36 Info.bckup > -rw-r--r-- 1 simon staff 496B 25 Jan 14:36 Info.plist > drwxr-xr-x 8 simon staff 272B 25 Jan 14:36 bands > -rw-r--r-- 1 simon staff 0B 25 Jan 14:36 token > > a.sparsebundle//bands: > total 34952 > -rw-r--r-- 1 simon staff 2.1M 25 Jan 14:37 0 > -rw-r--r-- 1 simon staff 2.4M 25 Jan 14:36 1 > -rw-r--r-- 1 simon staff 2.0M 25 Jan 14:36 2 > -rw-r--r-- 1 simon staff 912K 25 Jan 14:36 6 > -rw-r--r-- 1 simon staff 8.0M 25 Jan 14:36 b > -rw-r--r-- 1 simon staff 1.7M 25 Jan 14:36 cIt is **NOT** the same as a unix sparse file ! The contents are divided up into chunks, with each chunk stored in a file of it's own. I suspect this may also have an impact on performance. As the disk is filled, the "bands" files grow in number and size - with the disk filled, the bands are are complete from 0 through c, with all but c being 8M. As an aside, there is also an unfortunate combination of name and Finder behaviour. If you set the Finder to show file extensions, it will show (eg in this case) "a.sparsebundle" - but if the name is a bit longer, it shows the begging of the name, an ellipsis "...", and the end of the name including extension. My mother was "a little confused" when she saw a folder on my screen with several "...arsebundle"s ! There are a lot of layers in your setup - any of them (or some combination thereof) could be slowing things down. Rsync Filesystem on B Loppback mount (and associated systems) on B AFP between A and B - is the host for A an OS X machine running native AFP, or something like Linux running Netatalk ? Filesystem on A - inc sparse bundle file support Disk subsystem on A A few things come to mind ... 1) I am aware that AFP has some performance issues with some combinations of operations - no I don't know if this is one of them. 2) More importantly, if you look back through the archives, there was a thread not long ago about poor performance of rsync for "very large" file counts - and 45 million is "large". I didn't pay much attention, but IIRC the originator of that thread was proposing some alterations to improve things. 3) While rsync is designed to operate efficiently over slow/high latency links - 100MBps is always going to have an impact on throughput. As an experiment, can you mount the disk of A locally on B ? Shut down the system hosting A and put it in FireWire Target Mode then connect it to B - A's disk then appears as a local FireWire disk on B. This will show whether AFP has any bearing on performance. If the computer hosting A doesn't support target mode then your a bit stuffed - but there may be other options. Or alternately, connect the external disk directly to A's host rather than to B. Either way, you can then run rsync as a local copy without the network element. But as I write this, something far far more important comes to mind.Files on HFS disks are not like files on other filesystems (though I believe NTFS has a feature which adds similar complications). I am not sure exactly how rsync handled this - I do recall that Apple's version adds support for the triplet of "metadata + resource fork + data fork". From memory this results in many files getting re-copied every time regardless of whether they were modified or not. Memory is only vague, but I think it was something to do with comparing source and dest doesn't work properly when one end is looking at "whole file" and the other is only looking at one part. I would suggest doing a test copy using only a small part of the tree, and do the copy again (so no files actually changed) and watch carefully what's been copied. I vaguely recall (from a looong time ago) that any file with a resource fork was re-copied each time even though it's not changed. If this is the case, and I'm not misremembering, then it's possible that the combination of "rsync not handling very large file sets well" and "resource forks causing issues" could be (at least partly) behind your performance problem. Another test I;d be inclined to try would be to copy things one restore point at a time. As you'll be aware, each restore point is it's own timestamped directory - hardlinked to the previous one for files that haven't changed. Try rsyncing only the last one, then the last two, then the last 3, then the last 4, and so on. You can use --include and --exclude to do this. See how performance varies as the number of included trees increases - I suspect it increases more than linearly given the work involved in tracking hard-links.
dbonde+forum+rsync.lists.samba.org at gmail.com
2016-Jan-25 16:07 UTC
Why is my rsync transfer slow?
Thank you. I will try your suggestions. First I will connect the NAS directly to the computer (Do you recommend USB2 or 1 Gb Ethernet? Or should I daisy chain external HD and NAS? Then it would look like this: Computer <--FW800--> HD <--USB2--> NAS The other option is HD <--FW800--> Computer <--USB2 or Ethernet 1000Mbit --> NAS But I still must say it is weird that rsync seems slower than Finder. I also might have a look at ditto or CpMac. On 2016-01-25 15:50, Simon Hobson wrote:> dbonde+forum+rsync.lists.samba.org at gmail.com wrote: > >> It is exactly as I wrote. On a network volume (A) a "sparse disk image bundle" (B), i.e., a type of disk image used in OS X, is stored. B is then mounted locally (i.e., local to where rsync is run) on a computer (C) where it appears as one of many volumes. >> >> In other words, B is stored on A. A is then mounted (using AFP) on C. C then mounts B (=opens a file on a network volume, but instead of opening e.g., a spreadsheet in Excel, opening B shows a new volume on the desktop of C) stored on A. > > >> The computer where it is mounted just sees a mounted volume - it can't distinguish between a disk image stored remotely or stored on the computers internal hard drive. > > I wouldn't count on that ! > >> I assume you are familiar with the idea of disk images? > > I think most are familiar with disk images - but not so many with the specific implementations used by OS X. > OS X has the concept of a "bundle". To the user this appears as a single file with it's own name and icon. Internally it's a folder tree with a number of files/folders. > As a quick test, I've just created a 100M sparse image, here's the contents before I've added any files : >> $ ls -lRh a.sparsebundle/ >> total 16 >> -rw-r--r-- 1 simon staff 496B 25 Jan 14:36 Info.bckup >> -rw-r--r-- 1 simon staff 496B 25 Jan 14:36 Info.plist >> drwxr-xr-x 8 simon staff 272B 25 Jan 14:36 bands >> -rw-r--r-- 1 simon staff 0B 25 Jan 14:36 token >> >> a.sparsebundle//bands: >> total 34952 >> -rw-r--r-- 1 simon staff 2.1M 25 Jan 14:37 0 >> -rw-r--r-- 1 simon staff 2.4M 25 Jan 14:36 1 >> -rw-r--r-- 1 simon staff 2.0M 25 Jan 14:36 2 >> -rw-r--r-- 1 simon staff 912K 25 Jan 14:36 6 >> -rw-r--r-- 1 simon staff 8.0M 25 Jan 14:36 b >> -rw-r--r-- 1 simon staff 1.7M 25 Jan 14:36 c > It is **NOT** the same as a unix sparse file ! > The contents are divided up into chunks, with each chunk stored in a file of it's own. I suspect this may also have an impact on performance. As the disk is filled, the "bands" files grow in number and size - with the disk filled, the bands are are complete from 0 through c, with all but c being 8M. > > As an aside, there is also an unfortunate combination of name and Finder behaviour. If you set the Finder to show file extensions, it will show (eg in this case) "a.sparsebundle" - but if the name is a bit longer, it shows the begging of the name, an ellipsis "...", and the end of the name including extension. My mother was "a little confused" when she saw a folder on my screen with several "...arsebundle"s ! > > > There are a lot of layers in your setup - any of them (or some combination thereof) could be slowing things down. > > Rsync > Filesystem on B > Loppback mount (and associated systems) on B > AFP between A and B - is the host for A an OS X machine running native AFP, or something like Linux running Netatalk ? > Filesystem on A - inc sparse bundle file support > Disk subsystem on A > > A few things come to mind ... > > 1) I am aware that AFP has some performance issues with some combinations of operations - no I don't know if this is one of them. > 2) More importantly, if you look back through the archives, there was a thread not long ago about poor performance of rsync for "very large" file counts - and 45 million is "large". I didn't pay much attention, but IIRC the originator of that thread was proposing some alterations to improve things. > 3) While rsync is designed to operate efficiently over slow/high latency links - 100MBps is always going to have an impact on throughput. > > As an experiment, can you mount the disk of A locally on B ? Shut down the system hosting A and put it in FireWire Target Mode then connect it to B - A's disk then appears as a local FireWire disk on B. This will show whether AFP has any bearing on performance. If the computer hosting A doesn't support target mode then your a bit stuffed - but there may be other options. > Or alternately, connect the external disk directly to A's host rather than to B. > Either way, you can then run rsync as a local copy without the network element. > > > But as I write this, something far far more important comes to mind.Files on HFS disks are not like files on other filesystems (though I believe NTFS has a feature which adds similar complications). I am not sure exactly how rsync handled this - I do recall that Apple's version adds support for the triplet of "metadata + resource fork + data fork". From memory this results in many files getting re-copied every time regardless of whether they were modified or not. Memory is only vague, but I think it was something to do with comparing source and dest doesn't work properly when one end is looking at "whole file" and the other is only looking at one part. > > I would suggest doing a test copy using only a small part of the tree, and do the copy again (so no files actually changed) and watch carefully what's been copied. I vaguely recall (from a looong time ago) that any file with a resource fork was re-copied each time even though it's not changed. > > If this is the case, and I'm not misremembering, then it's possible that the combination of "rsync not handling very large file sets well" and "resource forks causing issues" could be (at least partly) behind your performance problem. > > > Another test I;d be inclined to try would be to copy things one restore point at a time. As you'll be aware, each restore point is it's own timestamped directory - hardlinked to the previous one for files that haven't changed. Try rsyncing only the last one, then the last two, then the last 3, then the last 4, and so on. You can use --include and --exclude to do this. See how performance varies as the number of included trees increases - I suspect it increases more than linearly given the work involved in tracking hard-links. > >
dbonde+forum+rsync.lists.samba.org at gmail.com wrote:> Thank you. I will try your suggestions. First I will connect the NASAh, you didn't mention NAS ! How is it connected to the computer hosting "A" ? If via network then you've added *another* layer.> directly to the computer (Do you recommend USB2 or 1 Gb Ethernet? Or should I daisy chain external HD and NAS? Then it would look like this: > > Computer <--FW800--> HD <--USB2--> NASHighly unlikely to work - the drive won't have a Firewire to USB bridge. Having the two different ports is to allow the drive to be connected to *a* host with either Firewire *OR* USB.> The other option is > > HD <--FW800--> Computer <--USB2 or Ethernet 1000Mbit --> NASIf you use a network connection then you've still got that network layer. If connected via USB, does it appear to the host as "just a drive" ? If so then use that.> I also might have a look at ditto or CpMac.Also consider Carbon Copy Cloner - it has a free trial.
dbonde+forum+rsync.lists.samba.org at gmail.com
2016-Feb-06 18:44 UTC
Why is my rsync transfer slow?
I scrapped all my previous progress and started over with a different "connection setup", now the NAS is connected to the computer using 1 Gbit wired ethernet, while the source disk is still using FW800. On bigger files I now typically get 20-30 MB/s so that was a substantial improvement. It still are occaissonal hiccups (I'm waiting for one as I write this, and this has lasted half an hour) but my impression is that they are fewer than before. However, there are still problems. 1. Does rsync leak memory? When I started this transfer Jan 31 (yes, a week ago) the three rsync processes that was spawned took a tens or maybe hundreds of kB of memory. Currently, according to Activity Monitor they use Memory (both "regular" and compressed) 2.69 GB 2.47 GB 2.41 GB (2.40 GB compressed) They have also used a lot of CPU time: % ps -auxww | grep rsync rsync 19545 11 20 SN+ 723:20.57 0.0 rsync 19544 46 20 UN+ 263:32.55 0.0 rsync 19543 31 20 SN+ 936:51.55 0.0 Mostly I still let this computer idle but when I use it it sometimes freezes in a way that reminds of how it feels when you use a computer that hasn't enough memory (it has 8 GB). Although, according to Activity Monitor, memory pressure is still green (used 6.5 GB, Cache 1.4 GB, Swap 10 GB). 2. Something is broken with the transfer and rsync's handling of hard links. On the source: % du -sh 1.0T . on the destination % du -sh 3.9T . Yes, the destination is 4 times bigger and I have only transferred about 40% (some 80 directories out of 198) (I am currently at 63 millions of files) of the contents of the source. I think I read somewhere that Time Machine use a special filesystem feature, hard links to directories. Does rsync handle this? I have also gathered some rudimentary statistics, the size of the logfile of rsync as well as the number of lines (two lines per transferred file) in the log file. Start time was Jan 31, 13:11 zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 1.6G 1 Feb 01:06 /tmp/rsync.log zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 2.9G 1 Feb 09:11 /tmp/rsync.log zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 8.4G 2 Feb 18:26 /tmp/rsync.log zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 9.7G 3 Feb 01:37 /tmp/rsync.log zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 12G 3 Feb 22:21 /tmp/rsync.log zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 15G 4 Feb 21:20 /tmp/rsync.log zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 15G 4 Feb 23:49 /tmp/rsync.log zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 17G 5 Feb 07:22 /tmp/rsync.log zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 18G 5 Feb 18:24 /tmp/rsync.log zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 19G 6 Feb 10:18 /tmp/rsync.log zsh-% ls -lh /tmp/rsync.log -rw-r--r-- 1 db wheel 20G 6 Feb 19:04 /tmp/rsync.log As you can see in the first 12 h the log grew to 1.6 GB. After 53 h it was 8.4 GB but then the growth slowed down and in the last 24 h it just grew 2 GB (18 to 20) (the last and the third last points). This is not super exact but I think it gives a rough indication of the performance. A similar metric but this time counting the lines in the log: zsh-% date; wc -l /tmp/rsync.log Mon 1 Feb 2016 09:23:41 CET 20045861 /tmp/rsync.log zsh-% date; wc -l /tmp/rsync.log Mon 1 Feb 2016 18:20:21 CET 29886346 /tmp/rsync.log zsh-% date; time wc -l /tmp/rsync.log Mon 1 Feb 2016 18:21:01 CET 29906243 /tmp/rsync.log wc -l /tmp/rsync.log 6.53s user 4.51s system 46% cpu 23.860 total zsh-% zsh-% date; time wc -l /tmp/rsync.log Wed 3 Feb 2016 01:37:28 CET 66717053 /tmp/rsync.log wc -l /tmp/rsync.log 14.57s user 9.09s system 55% cpu 42.701 total zsh-% date; time wc -l /tmp/rsync.log Wed 3 Feb 2016 22:21:29 CET 83738765 /tmp/rsync.log wc -l /tmp/rsync.log 18.44s user 12.98s system 39% cpu 1:19.75 total zsh-% date; time wc -l /tmp/rsync.log Thu 4 Feb 2016 21:20:57 CET 103578124 /tmp/rsync.log wc -l /tmp/rsync.log 22.70s user 16.81s system 30% cpu 2:07.54 total zsh-% date; time wc -l /tmp/rsync.log Thu 4 Feb 2016 23:46:19 CET 106439787 /tmp/rsync.log wc -l /tmp/rsync.log 23.41s user 18.00s system 26% cpu 2:34.63 total zsh-% date; time wc -l /tmp/rsync.log Fri 5 Feb 2016 07:22:06 CET 113853825 /tmp/rsync.log wc -l /tmp/rsync.log 24.99s user 18.90s system 31% cpu 2:20.75 total zsh-% date; time wc -l /tmp/rsync.log Fri 5 Feb 2016 18:24:44 CET 122665638 /tmp/rsync.log wc -l /tmp/rsync.log 26.91s user 19.15s system 42% cpu 1:48.37 total zsh-% date; time wc -l /tmp/rsync.log Sat 6 Feb 2016 10:18:36 CET 134194513 /tmp/rsync.log wc -l /tmp/rsync.log 29.38s user 20.62s system 41% cpu 1:59.18 total zsh-% date; time wc -l /tmp/rsync.log Sat 6 Feb 2016 19:05:15 CET 136240005 /tmp/rsync.log wc -l /tmp/rsync.log 29.85s user 21.57s system 33% cpu 2:32.81 total zsh-% What gives?