samba-bugs at samba.org
2014-Jan-14 17:22 UTC
[Bug 10380] New: Non-Nested Folder Optimisation
https://bugzilla.samba.org/show_bug.cgi?id=10380 Summary: Non-Nested Folder Optimisation Product: rsync Version: 3.1.0 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P5 Component: core AssignedTo: wayned at samba.org ReportedBy: me at haravikk.com QAContact: rsync-qa at samba.org One handy feature of most (all?) unix file systems is that if the contents of a folder are changed, then the folder's modified time will be updated as well, which presents an opportunity for optimised comparisons. Quite simply, if a folder's modified time is the same as the corresponding folder in the destination, then no comparison of its contents needs to be performed. At least, that is true if the folder doesn't contain any nested folders, as modifications to these do not affect the parent folder's modified time. What I would like to propose is that when rsync begins generating a file list for a destination directory, that it should attempt to mark any directories that do no contain nested directories. When it comes time to compare such a directory, the comparison can be done using only the modified times (if rsync is operating in that mode), and only needs to compare file lists if these times differ. Provided rsync is able to generate enough of the destination file-list in advance, then this should allow many unchanged folders to be skipped in their entirety, without having to delve further into their contents. This feature could be enhanced by the use of a metadata file (see bug 10379) by storing a flag in a destination folder if it contains no nested directories. This way, so long as the metadata file is valid, there is no need to process the directory's file-list before an optimised comparison is performed. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
I may be missing the point but the contents of a file within a directory can change without affecting the directory.> ls -ld . dog > drwxrwxrwt 24 root wheel 816 Jan 14 11:39 . > -rw-rw-r-- 1 pedzan wheel 0 Jan 14 11:41 dog > > echo more >> dog > > ls -ld . dog > drwxrwxrwt 24 root wheel 816 Jan 14 11:39 . > -rw-rw-r-- 1 pedzan wheel 5 Jan 14 12:39 dogdog changed. Its timestamp changed. But the containing directory's timestamp did not. This is on a Mac but I think this is generally true. If you add or delete a file, the directory's modified time changes but not if the files within it change. On Jan 14, 2014, at 11:22 AM, samba-bugs at samba.org wrote:> https://bugzilla.samba.org/show_bug.cgi?id=10380 > > Summary: Non-Nested Folder Optimisation > Product: rsync > Version: 3.1.0 > Platform: All > OS/Version: All > Status: NEW > Severity: enhancement > Priority: P5 > Component: core > AssignedTo: wayned at samba.org > ReportedBy: me at haravikk.com > QAContact: rsync-qa at samba.org > > > One handy feature of most (all?) unix file systems is that if the contents of a > folder are changed, then the folder's modified time will be updated as well, > which presents an opportunity for optimised comparisons. > > Quite simply, if a folder's modified time is the same as the corresponding > folder in the destination, then no comparison of its contents needs to be > performed. > At least, that is true if the folder doesn't contain any nested folders, as > modifications to these do not affect the parent folder's modified time. > > What I would like to propose is that when rsync begins generating a file list > for a destination directory, that it should attempt to mark any directories > that do no contain nested directories. When it comes time to compare such a > directory, the comparison can be done using only the modified times (if rsync > is operating in that mode), and only needs to compare file lists if these times > differ. Provided rsync is able to generate enough of the destination file-list > in advance, then this should allow many unchanged folders to be skipped in > their entirety, without having to delve further into their contents. > > This feature could be enhanced by the use of a metadata file (see bug 10379) by > storing a flag in a destination folder if it contains no nested directories. > This way, so long as the metadata file is valid, there is no need to process > the directory's file-list before an optimised comparison is performed. > > -- > Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the QA contact for the bug. > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: <http://lists.samba.org/pipermail/rsync/attachments/20140114/14c82295/attachment.pgp>
I may be missing the point but the contents of a file within a directory can change without affecting the directory.> ls -ld . dog > drwxrwxrwt 24 root wheel 816 Jan 14 11:39 . > -rw-rw-r-- 1 pedzan wheel 0 Jan 14 11:41 dog > > echo more >> dog > > ls -ld . dog > drwxrwxrwt 24 root wheel 816 Jan 14 11:39 . > -rw-rw-r-- 1 pedzan wheel 5 Jan 14 12:39 dogdog changed. Its timestamp changed. But the containing directory's timestamp did not. This is on a Mac but I think this is generally true. If you add or delete a file, the directory's modified time changes but not if the files within it change. On Jan 14, 2014, at 11:22 AM, samba-bugs at samba.org wrote:> https://bugzilla.samba.org/show_bug.cgi?id=10380 > > Summary: Non-Nested Folder Optimisation > Product: rsync > Version: 3.1.0 > Platform: All > OS/Version: All > Status: NEW > Severity: enhancement > Priority: P5 > Component: core > AssignedTo: wayned at samba.org > ReportedBy: me at haravikk.com > QAContact: rsync-qa at samba.org > > > One handy feature of most (all?) unix file systems is that if the contents of a > folder are changed, then the folder's modified time will be updated as well, > which presents an opportunity for optimised comparisons. > > Quite simply, if a folder's modified time is the same as the corresponding > folder in the destination, then no comparison of its contents needs to be > performed. > At least, that is true if the folder doesn't contain any nested folders, as > modifications to these do not affect the parent folder's modified time. > > What I would like to propose is that when rsync begins generating a file list > for a destination directory, that it should attempt to mark any directories > that do no contain nested directories. When it comes time to compare such a > directory, the comparison can be done using only the modified times (if rsync > is operating in that mode), and only needs to compare file lists if these times > differ. Provided rsync is able to generate enough of the destination file-list > in advance, then this should allow many unchanged folders to be skipped in > their entirety, without having to delve further into their contents. > > This feature could be enhanced by the use of a metadata file (see bug 10379) by > storing a flag in a destination folder if it contains no nested directories. > This way, so long as the metadata file is valid, there is no need to process > the directory's file-list before an optimised comparison is performed. > > -- > Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the QA contact for the bug. > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: <http://lists.samba.org/pipermail/rsync/attachments/20140114/300a5afd/attachment.pgp>
https://bugzilla.samba.org/show_bug.cgi?id=10380 Wayne Davison <wayned at samba.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX --- Comment #1 from Wayne Davison <wayned at samba.org> 2014-01-19 22:10:43 UTC --- Changed files don't affect a directory's mtime, only new files. Overall it's not a very reliable way to try to optimize file transfers. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=10380 --- Comment #2 from Haravikk <me at haravikk.com> 2014-01-20 12:40:05 UTC --- Are you sure? It seems to update the folder mtime on HFS+ at least, but if it doesn't work on other file systems then yeah you're right, maybe not worth it. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=10380 --- Comment #3 from Kevin Korb <rsync at sanitarium.net> 2014-01-20 14:03:40 UTC --- That is an HFS+ "feature". It is why Apple's Time Machine backup system works faster than rsync. They do utilize this optimization but it would only work on HFS+. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Maybe Matching Threads
- [Bug 10379] New: rsync metadata files
- [Bug 10507] New: Structured Output (for Simpler Parsing)
- [Bug 9812] New: Lookahead file-list loading and comparison
- [Bug 14371] New: Combined Exclude & Protect Filter Type
- Disable Client Certificate Authentication for Unencrypted Connections?