samba-bugs at samba.org
2017-Jun-05 18:47 UTC
[Bug 12819] New: [PATCH] sync() on receiving side for data consistency
https://bugzilla.samba.org/show_bug.cgi?id=12819 Bug ID: 12819 Summary: [PATCH] sync() on receiving side for data consistency Product: rsync Version: 3.1.2 Hardware: All OS: All Status: NEW Severity: normal Priority: P5 Component: core Assignee: wayned at samba.org Reporter: ben.rubson at gmail.com QA Contact: rsync-qa at samba.org Created attachment 13253 --> https://bugzilla.samba.org/attachment.cgi?id=13253&action=edit rsync_sync Hello, Here is a patch which sync() once files received, for data consistency. Thank you ! Ben -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2017-Jun-13 17:37 UTC
[Bug 12819] [PATCH] sync() on receiving side for data consistency
https://bugzilla.samba.org/show_bug.cgi?id=12819 --- Comment #1 from Brian K. White <brian at aljex.com> --- This seems wrong to me. If the OS is failing to manage write buffers and file access between processes, you would have a lot bigger problems in every process all through the system, and this wouldn't fix it. Similarly, if rsync were corrupting data, a lot of people would already know about it. It gets used way too much and too heavily for anything like this to go unnoticed for more than a day, let alone 15 or more years. It's almost axiomatic: No matter what problem you think you have, no matter what language or OS or platform, if you think it's fixed by either sleep() or sync(), it's not. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2017-Jun-14 07:45 UTC
[Bug 12819] [PATCH] sync() on receiving side for data consistency
https://bugzilla.samba.org/show_bug.cgi?id=12819 --- Comment #2 from Ben RUBSON <ben.rubson at gmail.com> --- Thank you for your feedback Brian. I don't have any problem. I just want to be sure that when client (sender) has finished its transfer, its data is on server's (receiver) disks, before it disconnects. So that when it correctly / successfully disconnects, its data is for sure on disks. On disks means on platters, so that if there is a failure (hardware, power...), data is safe, not lost. Of course disks which do not lie about sync() command must be used (data must be on platters, not only in disks' cache). As well as a robust filesystem, some redundancy... (but here that's off-topic). Perhaps we could make it an option, so that those who have OS failing to manage write buffers would not be degraded even more... But certainly they should have a look to their performance issue first. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2017-Jun-14 09:56 UTC
[Bug 12819] [PATCH] sync() on receiving side for data consistency
https://bugzilla.samba.org/show_bug.cgi?id=12819 --- Comment #3 from Paul Slootman <paul at debian.org> --- How about just using a post-xfer command on the server side that does 'sync'? -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2017-Jun-14 10:07 UTC
[Bug 12819] [PATCH] sync() on receiving side for data consistency
https://bugzilla.samba.org/show_bug.cgi?id=12819 --- Comment #4 from Ben RUBSON <ben.rubson at gmail.com> --- Yes Paul I thought about it but sync command may not be available if the server (receiver) is chrooted (for example using patch proposed in #12817). -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2017-Jun-14 18:42 UTC
[Bug 12819] [PATCH] sync() on receiving side for data consistency
https://bugzilla.samba.org/show_bug.cgi?id=12819 --- Comment #5 from Brian K. White <brian at aljex.com> --- Any program could make this same "just to be safe" argument practically every time they ever close-on-write for any reason. If they wrote anything, it was always for some reason, and they want to know for sure that it really got safely written. There is nothing special about rsync in that regard. cp might as well have it. The ">" operator in bash might as well have it. The kernel and vfs and hardware drivers all already do whatever is necessary in that regard, and it's generally wrong for any application to try to do it itself. Otherwise the disk would be in a constant state of sync()'ing and never actually manage to get any other work done. Consider a multiuser host with 500 rsync receivers. Each individual sync() is incredibly disruptive to all other processes. "Everyone hold up while we flush the disk buffer...". The entire system waits while that happens. That way just leads to things like the example you just used, lower layers that just start lying about sync() to upper layers because too many apps use it when they shouldn't. "Fine, if apps are going to sync all the time, that ends up being 86 times a second between all procs running at any given moment, which is unsupportable, so we'll just make sync() a no-op stub and we'll do it when it's' actually required, and apps can sync()-away to their hearts content". I think the only reason rsync might have to sync is if you built rsync as a self-contained bootable executable like memtest86, or possibly as an MS-DOS executable. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2017-Jun-14 19:01 UTC
[Bug 12819] [PATCH] sync() on receiving side for data consistency
https://bugzilla.samba.org/show_bug.cgi?id=12819 --- Comment #6 from Brian K. White <brian at aljex.com> --- Think of it this way, write() already makes a certain promise that it will not return until it's done it's job, and it will not assert success when it can't. Essentially the man page for any syscall is a contract. In fact all API's are contracts. write() in turn is relies on various other calls to even lower layers to keep their promises too, to manage the in-kernel buffer or the cache on a raid card etc. All of these things MUST be relied on rather than second-guessed. It would be insane for example, for write() to say "I can't really be sure this disk driver has really done it's thing. I better force it to sync before I return to the application." or "I can't really be sure malloc() really allocated the memory, I better malloc 3 or 4 copies and compare them and use whichever copies agree with each other... It's insane. You write(), you check the return value, and you're done. The low level hardware is someone else's job, and you won't be doing a better job than they already did. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2017-Jun-15 13:23 UTC
[Bug 12819] [PATCH] sync() on receiving side for data consistency
https://bugzilla.samba.org/show_bug.cgi?id=12819 --- Comment #7 from Ben RUBSON <ben.rubson at gmail.com> --- And what about a power failure between 2 ZFS transaction groups ? Note that my patch simply adds a sync() just after recv_files(), so one sync() per connection, not per write operation. Quite low workload actually :) But we could make this a rsync option, so that one can enable / disable it on its own. -- You are receiving this mail because: You are the QA Contact for the bug.
Karl O. Pinc
2017-Jun-15 17:29 UTC
[Bug 12819] [PATCH] sync() on receiving side for data consistency
On Thu, 15 Jun 2017 13:23:44 +0000 just subscribed for rsync-qa from bugzilla via rsync <rsync at lists.samba.org> wrote:> https://bugzilla.samba.org/show_bug.cgi?id=12819 > > --- Comment #7 from Ben RUBSON <ben.rubson at gmail.com> ---> Note that my patch simply adds a sync() just after recv_files(), so > one sync() per connection, not per write operation.> But we could make this a rsync option, so that one can enable / > disable it on its own.I think the "right" rsync option to add (because rsync does not have enough options already ;-) is a --hook-post option. It would run something (a `sync` in your case) on the remote end after finishing. There are clear security issues here. Rather than having --hook-post and having to do something (a server side config option that says what --hook-post can do?) to address the security concerns it seems much simpler to improve the rsync documentation regarding running the rsync server side. I'm still using command="rsync --server --daemon ." in my ~/.ssh/authorized_keys file on the remote end. It'd be simple enough to add, say, a "sync" to the end of this to force a sync when rsync finishes. The problem is that the --server (and, especially, --daemon) documentation has gone away. Or at least left the man page. (v3.1.1, Debian 8, Jessie) Except for a hint that --server exists at the bottom. If the server side of rsync was better documented then perhaps a simple inetd rsync service (or --rsync-path or -e value, etc.) would be easy for the end-user to cobble together to meet needs such as this. Can somebody please explain --server? (And --sender, I guess.) I might (possibly) be motivated to send in a man page patch. Regards, Karl <kop at meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
samba-bugs at samba.org
2017-Jun-15 18:53 UTC
[Bug 12819] [PATCH] sync() on receiving side for data consistency
https://bugzilla.samba.org/show_bug.cgi?id=12819 --- Comment #8 from Brian K. White <brian at aljex.com> --- You tell me, what ABOUT a power failure between 2 zfs, or any other fs operations? This does not improve or solve any problem that the fs and all the other layers aren't already handling. This is simply a misguided idea, however sensible and attractive it seems. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2020-May-26 17:21 UTC
[Bug 12819] [PATCH] sync() on receiving side for data consistency
https://bugzilla.samba.org/show_bug.cgi?id=12819 Ben RUBSON <ben.rubson at gmx.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED --- Comment #9 from Ben RUBSON <ben.rubson at gmx.com> --- Patch moved : https://github.com/WayneD/rsync/pull/4 -- You are receiving this mail because: You are the QA Contact for the bug.