Thomas Güttler
2017-Feb-09 09:05 UTC
Huge directory tree: Get files to sync via tools like sysdig
Hi, we have a huge directory tree. * 17M files (number of files) * 2.2TBytes of data. * Only 0.1% changes per day Current pain: rsyncs directory tree traversal needs to long to discover the changed files. Only few files change. I discovered the tool sysdig which could be used to monitor the files which were changed. Then we could feed the list of changed files to rsync and avoid the long directory traversal of rsync. Has someone experience with collecting the changed files with a third party tool which detects which files were changed? Regards, Thomas Güttler -- Thomas Guettler http://www.thomas-guettler.de/
Axel Kittenberger
2017-Feb-09 09:55 UTC
Huge directory tree: Get files to sync via tools like sysdig
> Has someone experience with collecting the changed files > with a third party tool which detects which files were changed?I don't know of sysdig but am the developer of Lsyncd which does exactly that, collect file changes via inotify event mechanism and then calls rsync with a matching filter mask. However, since you say, your directory tree is hugh, the main issue is that for every directory an inotify watch must be created, taking about 1KB of kernel memory per watch. If you got a million directories this is a GB of unswapable memory use. Unfortunally the Linux kernel doesn't provide a better way yet, and I suppose other tools like sysdig suffer from the same issue. There is fanotify, but that doesn't report move event and thus is not useable for this task. Kind regards, Axel On Thu, Feb 9, 2017 at 10:05 AM, Thomas Güttler < guettliml at thomas-guettler.de> wrote:> Hi, > > we have a huge directory tree. > > > * 17M files (number of files) > * 2.2TBytes of data. > * Only 0.1% changes per day > > Current pain: rsyncs directory tree traversal needs to long to discover > the changed files. Only few files change. > > I discovered the tool sysdig which could be used to monitor the files > which were changed. > > Then we could feed the list of changed files to rsync and avoid the long > directory traversal of rsync. > > Has someone experience with collecting the changed files with a third > party tool which detects which > files were changed? > > Regards, > Thomas Güttler > > > > -- > Thomas Guettler http://www.thomas-guettler.de/ > > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: https://lists.samba.org/mailma > n/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20170209/6e53dfd0/attachment.html>
Ben RUBSON
2017-Feb-09 10:05 UTC
Huge directory tree: Get files to sync via tools like sysdig
> On 09 Feb 2017, at 10:05, Thomas Güttler <guettliml at thomas-guettler.de> wrote: > > Hi, > > we have a huge directory tree. > > > * 17M files (number of files) > * 2.2TBytes of data. > * Only 0.1% changes per day > > Current pain: rsyncs directory tree traversal needs to long to discover the changed files.Hi, On which type of FS is this directory ? Ben
Karl O. Pinc
2017-Feb-09 13:25 UTC
Huge directory tree: Get files to sync via tools like sysdig
On Thu, 9 Feb 2017 10:55:51 +0100 Axel Kittenberger <axkibe at gmail.com> wrote:> > Has someone experience with collecting the changed files > > with a third party tool which detects which files were changed? > > I don't know of sysdig but am the developer of Lsyncd which does > exactly that, collect file changes via inotify event mechanism and > then calls rsync with a matching filter mask. > > However, since you say, your directory tree is hugh, the main issue > is that for every directory an inotify watch must be created, taking > about 1KB of kernel memory per watch.Not only that, but inotify is not guaranteed. (At least not on 3.16.0. Can't say regards later versions.) So you might miss some changes. Karl <kop at meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Thomas Güttler
2017-Feb-09 15:10 UTC
Huge directory tree: Get files to sync via tools like sysdig
Am 09.02.2017 um 11:05 schrieb Ben RUBSON:>> On 09 Feb 2017, at 10:05, Thomas Güttler <guettliml at thomas-guettler.de> wrote: >> >> Hi, >> >> we have a huge directory tree. >> >> >> * 17M files (number of files) >> * 2.2TBytes of data. >> * Only 0.1% changes per day >> >> Current pain: rsyncs directory tree traversal needs to long to discover the changed files. > > Hi, > > On which type of FS is this directory ?ext4 -- Thomas Guettler http://www.thomas-guettler.de/
Apparently Analagous Threads
- Huge directory tree: Get files to sync via tools like sysdig
- Huge directory tree: Get files to sync via tools like sysdig
- Alternatives to rsync. Was: Huge directory tree: Get files to sync via tools like sysdig
- [Bug 12569] New: Missing directory errors not ignored
- How to sync an exact list of files, Including deletes!?