bugzilla-daemon@dp3.samba.org
2006-Jan-10 17:03 UTC
DO NOT REPLY [Bug 3392] New: fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 Summary: fuzzy misbehaving if source is a file Product: rsync Version: 2.6.6 Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: core AssignedTo: wayned@samba.org ReportedBy: egmont@uhulinux.hu QAContact: rsync-qa@samba.org I run rsync 2.6.6 on both the server and the client and perform a download rsync --fuzzy --other-options... rsync://some/url /local/directory Both the remote and the local URLs are absolute paths. If the remote URL is a directory and I perform recursive synchronization (e.g. "-a") then the --fuzzy option works perfectly just as I expect it, and it co-operates nicely with the --compare-dest or --copy-dest option. However, if the remote URL is a single plain file, then --fuzzy misbehaves. No matter what the remote or local path is, no matter if I specify a --compare-dest or --copy-dest option or not, no matter what its value is, these are all completely ignored, and the local file with the most similar name is searched in the current directory of the rsync process. This is seen from strace's output (only "." is opened as a directory), seen from "skipping directory xyz" messages that mention the subdirectories of the current dir, and seen from the fact that if I place the similar file here then rsync is much faster (i.e. it finds it here). Instead, rsync should look for similar filenames under the target directory (the directory component of the local path given in the last argument), or under the directories given by --compare-dest or --copy-dest. (Also, I think that generally if only absolute paths are given to rsync, it should do nothing with the current directory, it should be irrelevant what it is.) -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
bugzilla-daemon@dp3.samba.org
2006-Jan-11 00:23 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 ------- Comment #1 from hashproduct@verizon.net 2006-01-10 17:22 MST ------- This behavior is a consequence of the strange logic in get_local_name in main.c. If the destination path is given as a file, then rsync uses a "local name" and accesses the destination file by its full path rather than first changing to the containing directory of the destination file. When I was writing my custom rsync, I found that the behavior of get_local_name fouled up default ACL observance, so I rewrote and heavily commented get_local_name. The upshot is that my rsync changes to the containing directory when receiving no matter what. Please consider making the same change in the official rsync. This change may help the situation with --fuzzy to some degree, but issues remain, such as whether the basename of the source path or the destination path is used to search for a fuzzy basis file. My custom rsync is available here: http://mysite.verizon.net/hashproduct/myrsync/ -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
bugzilla-daemon@dp3.samba.org
2006-Jan-11 00:23 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 ------- Comment #2 from hashproduct@verizon.net 2006-01-10 17:22 MST ------- Created an attachment (id=1661) --> (https://bugzilla.samba.org/attachment.cgi?id=1661&action=view) main.c from my custom rsync, showing rewritten get_local_name -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
bugzilla-daemon@dp3.samba.org
2006-Jan-15 07:28 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 wayned@samba.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from wayned@samba.org 2006-01-15 00:27 MST ------- As Matt noted, the fuzzy option was expecting the current directory to be the parent directory of the destination file, and this wasn't true for a single file being copied to a new name. I have checked in a fix that makes rsync always push_dir() into the destination file's parent directory. (Thanks for the attachment, Matt -- I used some of your comments and the general logic from get_local_name(), though I rewrote it.) One other nice side-effect is that rsync gives a better error now if you copy a single file to a /totally/bogus/path/name. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
bugzilla-daemon@dp3.samba.org
2006-Jan-15 15:20 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 ------- Comment #4 from hashproduct@verizon.net 2006-01-15 08:19 MST ------- Nice. It occurs to me that maybe the first call to do_stat should be changed to link_stat(dest_path, &st, keep_dirlinks) in order to obey --keep-dirlinks when finding the top-level target directory. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Mar-13 13:12 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 egmont@uhulinux.hu changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |REOPENED Resolution|FIXED | ------- Comment #5 from egmont@uhulinux.hu 2006-03-13 07:12 MST ------- Reopening, since it's not okay in 2.6.7 (though definitely different than it was in 2.6.6). In 2.6.7, the --fuzzy option causes search for simlar filename in the target directory of the new file. This still means the value of the --compare-dest or --copy-dest option is ignored. Rsync should search for similar files in the directory specified by --co{mpare,py}-dest. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Mar-13 15:57 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 wayned@samba.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #6 from wayned@samba.org 2006-03-13 09:57 MST ------- Fuzzy is already a very expensive operation, and making it even more expensive is not a good idea, IMO. I want to leave it as it has always been defined: performing a fuzzy search in the destination directory. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Mar-13 16:32 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 egmont@uhulinux.hu changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |REOPENED Resolution|FIXED | ------- Comment #7 from egmont@uhulinux.hu 2006-03-13 10:32 MST ------- The fuzzy option is much less expensive than downloading a file from scratch instead of using a similar local file to compare to. My goal is, by the way, not more than maintaining rsync support for apt (the front-end for dpkg), using up-to-date tools and possibly mainstream solutions (that is, as few patches as possible). Please read the original report here: http://lists.debian.org/debian-devel/2003/07/msg00462.html The whole design of "apt" forces me into an environment where I have to download the new package into a different directory than where old packages reside, but I still want to take use of the fuzzy option so that people don't have to download 120 MB once a typo is fixed in the openoffice package. With rsync 2.6.6, despite the bug I reported, I still had an extremely easy workaround, I just had to put a chdir() call between the fork and exec in apt. Due to the fact that this "bug" of rsync-2.6.6 is "fixed" now, it still doesn't work out of the box, but at least now I don't even have such a simple workaround. So after all the situation became worse. Please read the linked mail and try to understand my needs and its reasons (which are not artificial, they were brought up by the real world). I hope you will understand why it would be so important for all the users of our distribuion that these two options worked together perfectly. But even if I failed to convince you, I'm reopening this bug since in this case rsync should explicitely refuse the --compare-dest and similar options with an error message if --fuzzy is also specified. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Mar-13 16:58 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 wayned@samba.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #8 from wayned@samba.org 2006-03-13 10:57 MST ------- The fuzzy algorithm is very expensive the more files it rates, so in larger transfers it would balloon into way too many fuzzy computations. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Mar-13 17:21 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 ------- Comment #9 from wayned@samba.org 2006-03-13 11:20 MST ------- One way to manualy optimize a transfer for a new directory that is related to an old directory is to first copy the old directory using --link-dest, and then copy the new directory using --fuzzy: rsync -av --link-dest=../olddir olddir/ dest:newdir rsync -av --fuzzy --delete-after newdir dest: The first rsync run will just hard-link everything into the newdir destination as long as you have a local copy of the identical olddir files. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Mar-14 11:46 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 ------- Comment #10 from egmont@uhulinux.hu 2006-03-14 05:46 MST ------- Dear Wayne, To comment #8: If you'd say "it's too hard to fix it", "we have no resources to implement this" or something similar, then I'd most likely accept it. But your arguments of fuzzy being expensive is plainly bullshit, for two reasons: First, I don't know if you have used this feature or not, I have used it, and it saved me many hours by being able to synchronice 2 GB of data behind an 1Mbit/sec ADSL line in half hour rather than in 4.5 hours. And if I save 4 hours in my life then I really don't care whether rsync needs 1 second or 1 minute of CPU time. Actually I never had a noticeable load caused by rsync. I hope you agree that 4 hours of wall clock time is much more expensive than several seconds (or maybe a few minutes) of CPU time. Second, the number of fuzzy computations doesn't depend on whether you compare to the file listing of the same directory, or to the file listing of another directory. So if we'd accept your argument that fuzzy needs too much CPU resources, then the whole fuzzy option should completely be dropped from rsync, even if someone wants to use it without the --copy-dest option. I hope this is not the way to go. So we're talking about two features which should be completely orthogonal to each other, but still they don't work together, for apparently no sane reason. This is just as stupid as if, let's say, you would be unable to preserve permissions (-p) when --delete is in effect. To comment #9: I'll try it but I'm not sure it will be as simple as you imagine it. Apt performs other operations on its cache, for example moving a file from "newdir" to "olddir", and I don't know if it will fail if a hardlink is already present there. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Mar-15 16:28 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 ------- Comment #11 from wayned@samba.org 2006-03-15 10:27 MST ------- I appreciate than in your specific circumstance that an enhance behavior of --fuzzy would be useful. My testing of the --fuzzy option has also included large transfers with many missing files where it would be a huge detriment. For instance, I have tried to use --fuzzy to transfer a large Maildir hierarchy, and the copy into a very large and active folder was so CPU intensive that it bogged the transfer down instead of speeding it up. This is because every missing file requires its own separate fuzzy computation, and that computation gets slower the more files there are in the destination directory. So, the --fuzzy option is already too CPU intensive for its own good in some circumstances. Given that, I don't want the file-set that fuzzy compares against to be made larger by default. It could possibly be made an optional behavior, e.g. requested by doubling the --fuzzy option, but that suggestion would best be made in a separate enhancement request (since it's best to target a bug-report at a specific issue, and this specific issue of the wrong directory being scanned for fuzzy matches has been fixed). Finally, some discussion of bug-tracking netiquette: please note that reopening a bug report twice in a row is considered to be a very rude action. While that might not make much sense logically, rudeness is more emotional than logical. I was once chastised for reopening a bug just once to ask a question that I thought might not have been considered in the closing of the bug. At the time I thought that the fellow was being overly dramatic, but after some experience of being on the other side, I realized that the reopening action actually conveys meaning that may not be intended by the reopener. Thus, a good rule of thumb when dealing with a bug that is not in obvious need of being reopened is this: Add a comment to the closed bug raising the issue. Check for a concensus for reopening or starting a new bug report. If there is no response, reopening might be needed to ensure that the issue is not forgotten. OK? Thanks for you input, and feel free to open an enhancement request for the --fuzzy option if you'd care to do so. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Mar-16 10:52 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 ------- Comment #12 from egmont@uhulinux.hu 2006-03-16 04:52 MST ------- I do agree that there are cirsumstances when fuzzy doesn't speed up things, or actually causes huge performance regression. On the other hand, as I described, there are also cases when it really helps a lot. It should be up to the users to choose whether to use it or not. You said:> I don't want the file-set that fuzzy compares against to be made > larger by default.I perfectly agree. I never talked about the _default_. I talked about the case where --compare-dest=... is also specified in addition to --fuzzy. This is clearly not the default, in this case the user explicitely asks for a larger file set to be fuzzy-compared to, hopefully knowing its pros and cons. About netiquette: I have seen several netiquettes, mostly about e-mail, but I can't remember seeing a bug-reporting netiquette anywhere. Please point me to an URL, I'll be happy to read it and follow its guidelines. First I reopened the bug since it was closed falsely, the bug mentioned in the original report is _not_ fixed (comment #5). For the second time (comment #7) I reopened it to (1) note that rsync doesn't mention its current behavior in its docs and doesn't refuse options that don't work together, and (2) to state that your arguments containing "IMO" and "I want to leave it..." are not quite strong arguments. For (1) I could have opened another report, I think it's quite a matter of taste. Some prefer to split each and every single step of a bigger problem set to a different issue, some rather want to see "co-operation of --compare-dest and --fuzzy" as one large bug that says: all details of this problem set should be solved. Seems that you belong to the first group, while I belong to the second one. Note that opening a new issue has at least one drawback: the cc list is lost. I think anyone who was interested in the first report is still interested in how fuzzy and compare-dest will work together. By the way: there are other ways to close a bug, there is INVALID, there is WONTFIX etc. These are not in netiquette, these are in the manual and UI of bugzilla. Still you chose FIXED. Why? Please read the _original_ report once again. It is _not_ fixed. The third time I commented (comment #10), _before_ you told it's rude to reopen bugs, I did _not_ reopen it, neither do so now. Do you think that closing a bug twice in the middle of a conversation, while that bug is not yet fixed, is not a rude action at all? Especially in comment #6 where you had no real argument at all, just a personal taste?> Thanks for you input, and feel free to open an enhancement request for the > --fuzzy option if you'd care to do so.So, if rsync doesn't work the way it should is not a bug, fixing it is only an enhancement request? And if I open a new bug then it is likely to be fixed (ohh, sorry, implemented as a new feature) but if I tell about it here then it will be forgotten? Despite that I still do not ask for anything more than to fix what I reported in my _original_ submission in this topic? Shall I really open another report of the same problem that I originally reported here? Sorry, I got tired of it. I'm happy to help the free software community anywhere, sometimes with bug reports or enhancement requests, sometimes with patches, sometimes with new sourcecode, with translations, or a lot of other stuff, but only as long as I don't keep on hitting walls during my work. I have no time and no power to try to fight against people who try to offend my requests or my work. It's clear that you do not want to fix it. I cannot force you to do. So just don't fix it. Leave it as it is now. Leave it as CLOSED _FIXED_ (khmmm). And let's forget this whole issue... -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2009-Nov-13 04:16 UTC
DO NOT REPLY [Bug 3392] fuzzy misbehaving if source is a file
https://bugzilla.samba.org/show_bug.cgi?id=3392 ------- Comment #13 from matt at mattmccutchen.net 2009-11-12 22:16 CST ------- I rediscovered this bug just now and thought I would comment on what happened so it doesn't stand as a blemish on the community. Wayne and Egmont disagreed on the merits of the request for --fuzzy to search --*-dest dirs. That's perfectly normal. What isn't normal is the miscommunication about the status of the request. Wayne wanted it to be filed as a separate ticket, a decision that I would generally grant developers the prerogative to make, but he never came out and said so explicitly until comment #11. Egmont considered it to be permanently within the scope of this ticket because it was stated in comment #0 and perceived that Wayne was simply trying to bury the issue, hence the reopen battle and the dissatisfaction with the FIXED resolution as expressed in comment #12. Wayne, if you had stated your desire to separate the issues in comment #6, the onus would have been on Egmont to file the enhancement request separately before proceeding, and the discussion could have proceeded there, perhaps with fervent disagreement and a WONTFIX decision but without acrimony. I would encourage both reporters and developers to avoid a repeat of this situation by considering, before a repeated reopen or resolution, whether there might be a simple miscommunication at fault. Note that I went ahead and entered the enhancement as bug 4056 at the time, and it remains open. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.