Kevin Korb
2016-Jun-09 10:29 UTC
Can rsync assume that the destination directory is empty ?
Actually, don't do --ignore-times. Even if it did prevent the stat calls it would also tell rsync to not care about matching files in the --link-dest dir which would be very bad. On 06/09/2016 06:27 AM, Kevin Korb wrote:> There isn't an option for that and it isn't actually required that the > target directory be empty (just a good idea). Plus it has to do the > stat calls on the other end anyway so I doubt there would be much > performance benefit. > > Maybe --ignore-times would cause it to not look but I kinda doubt it and > I am too tired to do an strace right now ;) > > On 06/09/2016 05:32 AM, Arnaud Aujon Chevallier wrote: >> Hello, >> >> I'm currently using rsync to backup up to 1 TB of small files of >> relatively small files (hundreds of Ko mostly) >> >> My backup strategy is to use a full backup and then backup the diff >> every day using hardlink with the previous backup. This means that each >> time I use rsync, the destination directory is empty. >> >> Using strace, I can see that rsync call a 'lstat' command to try to see >> if the file already exists in my destination directory. Is there an >> option to tell rsync that the destination directory is empty ? >> >> Do you think that avoiding this call can improve rsync performances in >> this specific case ? >> >> I tried reading the source code, but I'm not exactly sure where this >> lstat call happens. >> >> Thanks a lot, >> >> Arnaud Aujon Chevallier >> >> > > >-- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: <http://lists.samba.org/pipermail/rsync/attachments/20160609/22773362/signature.sig>
Arnaud Aujon Chevallier
2016-Jun-09 10:35 UTC
Can rsync assume that the destination directory is empty ?
Thanks for your answer, I ran some more test and it show that the lstat calls are only responsible for 3.7 % of the total time. So we could avoid about a third of them (the errors numbers), which will be about 1%, not very interesting :) % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 3.74 1.792339 1 2088744 693051 lstat Le 09/06/2016 à 12:29, Kevin Korb a écrit :> Actually, don't do --ignore-times. Even if it did prevent the stat > calls it would also tell rsync to not care about matching files in the > --link-dest dir which would be very bad. > > On 06/09/2016 06:27 AM, Kevin Korb wrote: >> There isn't an option for that and it isn't actually required that the >> target directory be empty (just a good idea). Plus it has to do the >> stat calls on the other end anyway so I doubt there would be much >> performance benefit. >> >> Maybe --ignore-times would cause it to not look but I kinda doubt it and >> I am too tired to do an strace right now ;) >> >> On 06/09/2016 05:32 AM, Arnaud Aujon Chevallier wrote: >>> Hello, >>> >>> I'm currently using rsync to backup up to 1 TB of small files of >>> relatively small files (hundreds of Ko mostly) >>> >>> My backup strategy is to use a full backup and then backup the diff >>> every day using hardlink with the previous backup. This means that each >>> time I use rsync, the destination directory is empty. >>> >>> Using strace, I can see that rsync call a 'lstat' command to try to see >>> if the file already exists in my destination directory. Is there an >>> option to tell rsync that the destination directory is empty ? >>> >>> Do you think that avoiding this call can improve rsync performances in >>> this specific case ? >>> >>> I tried reading the source code, but I'm not exactly sure where this >>> lstat call happens. >>> >>> Thanks a lot, >>> >>> Arnaud Aujon Chevallier >>> >>> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20160609/c0457a0b/attachment.html>
Simon Hobson
2016-Jun-09 13:11 UTC
Can rsync assume that the destination directory is empty ?
On 9 Jun 2016, at 11:35, Arnaud Aujon Chevallier <arnaud at intelibre.fr> wrote:> I ran some more test and it show that the lstat calls are only responsible for 3.7 % of the total time. > > So we could avoid about a third of them (the errors numbers), which will be about 1%, not very interesting :) > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 3.74 1.792339 1 2088744 693051 lstatIs that wallclock time or cpu time ? AIUI rsync is optimised to work over slow or high latency links and does parallel operations. Thus, while it may be doing checks it doesn't need to, these don't necessarily contribute to total time taken which will be dominated by data transfer times. Obviously this will depend on various factors - particularly the link speed and the performance of the two systems. If the stat operations happen in parallel with the data transfer, it may be that they don't affect overall time taken at all.