Karl O. Pinc
2017-Feb-09 13:25 UTC
Huge directory tree: Get files to sync via tools like sysdig
On Thu, 9 Feb 2017 10:55:51 +0100 Axel Kittenberger <axkibe at gmail.com> wrote:> > Has someone experience with collecting the changed files > > with a third party tool which detects which files were changed? > > I don't know of sysdig but am the developer of Lsyncd which does > exactly that, collect file changes via inotify event mechanism and > then calls rsync with a matching filter mask. > > However, since you say, your directory tree is hugh, the main issue > is that for every directory an inotify watch must be created, taking > about 1KB of kernel memory per watch.Not only that, but inotify is not guaranteed. (At least not on 3.16.0. Can't say regards later versions.) So you might miss some changes. Karl <kop at meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Axel Kittenberger
2017-Feb-09 13:43 UTC
Huge directory tree: Get files to sync via tools like sysdig
> > Not only that, but inotify is not guaranteed. (At least not on > 3.16.0. Can't say regards later versions.) So you might miss some > changes. >Got any info on that? I noted that MOVE_FROM and MOVE_TO events are not guaranted to arrive in order, or even the file descriptor might briefly close with "no more events" inbetween them, but I never ever heared of anybody encountering an issue of an event in a watched directory on not being correctly reported, without getting the information of an overlfow with an OVERFLOW event, which results in case of Lsyncd in a full rescan of everything. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20170209/9359a803/attachment.html>
Karl O. Pinc
2017-Feb-09 14:05 UTC
Huge directory tree: Get files to sync via tools like sysdig
On Thu, 9 Feb 2017 14:43:57 +0100 Axel Kittenberger <axkibe at gmail.com> wrote:> > > > Not only that, but inotify is not guaranteed. (At least not on > > 3.16.0. Can't say regards later versions.) So you might miss some > > changes. > > > > Got any info on that? > > I noted that MOVE_FROM and MOVE_TO events are not guaranted to arrive > in order, or even the file descriptor might briefly close with "no > more events" inbetween them, but I never ever heared of anybody > encountering an issue of an event in a watched directory on not being > correctly reported, without getting the information of an overlfow > with an OVERFLOW event, which results in case of Lsyncd in a full > rescan of everything.Not much. inotify(7) on my system says: With careful programming, an application can use inotify to efficiently monitor and cache the state of a set of filesystem objects. However, robust applications should allow for the fact that bugs in the monitorā ing logic or races of the kind described below may leave the cache inconsistent with the filesystem state. It is probably wise to to do some consistency checking, and rebuild the cache when inconsistencies are detected. I think one of the pretty much unavoidable race conditions is sub-directory creation; the sub-directory can have files added to it before the monitoring process is able to set a watch on it. Of course this is an application level race. I've had incron (which uses inotify) regularly fail to catch all monitored fs changes on a busy system. And the monitored system does not involve creating sub-directories -- and I don't think I'm exceeding the system's inotify event limit either. But I could be wrong about either of these. So perhaps the take-away is that inotify is "hard", or even "impossible" to rely on as the sole method for change monitoring. It may not be right to say it's "unreliable" as I did above. I'm not the expert here. But I can say that my limited experience with it makes me want to look very closely before relying on it. Regards, Karl <kop at meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein