$ wc -l /tmp/list 1000 /tmp/list $ rsync -i -aPv --ignore-existing --files-from=/tmp/list /backups/ ut00-s00010:/backups/ building file list ... 3937 files to consider I am totally baffled. That's not such a big deal, but the list I'm *actually* using has twenty *million* files in it. At a couple hundred files a second, if it's going to check 4 times the number of files, that's a *huge* time waste. What's going on? Here's what the list looks like: $ head /tmp/list cpool/b/c/5/bc5ea7a79a4824c6729645c66b562e6b cpool/7/7/8/77865de94585b4581f07e54065c7b1e3 cpool/2/5/0/250f326bfa69c9da011f809a8b46cea7 cpool/3/3/8/3382672447e7f9a00ea755cee7ad5187 cpool/1/0/e/10eec0876f979ca8773f63e697be0adf cpool/0/e/b/0ebf2a81c863702baa4eb38ec3cef655 cpool/3/6/c/36c915e781561292d9ae73e127504d0d cpool/b/5/0/b50dcb17dac0808c4b5de1a9a3b747af cpool/8/5/f/85fb8dc29ed1597c3fd0725ff91da279 cpool/9/0/8/90829abb5879fcbe39c2f55c4211b3c5 They are all like that, and they are all files, not directories. I thought it could be rsync checking the directories that have those files in them, but there are only 4300 directories, and when I stopped the big version (OK, *that* was a mistake, but I was worried about the behaviour) it was saying "28395900 files...", which is rather a lot more than 20 million + 4300. This is making a many hours difference to an already very long process; anyone know what's going on? -Robin -- http://singinst.org/ : Our last, best hope for a fantastic future. Lojban (http://www.lojban.org/): The language in which "this parrot is dead" is "ti poi spitaki cu morsi", but "this sentence is false" is "na nei". My personal page: http://www.digitalkingdom.org/rlp/
On Fri, Dec 10, 2010 at 09:11:39AM -0800, Robin Lee Powell wrote:> > $ wc -l /tmp/list > 1000 /tmp/list > > $ rsync -i -aPv --ignore-existing --files-from=/tmp/list /backups/ ut00-s00010:/backups/ > building file list ... > 3937 files to consider > > I am totally baffled. > > That's not such a big deal, but the list I'm *actually* using has > twenty *million* files in it. At a couple hundred files a second, > if it's going to check 4 times the number of files, that's a *huge* > time waste. What's going on? > > Here's what the list looks like: > > $ head /tmp/list > cpool/b/c/5/bc5ea7a79a4824c6729645c66b562e6b > cpool/7/7/8/77865de94585b4581f07e54065c7b1e3 > cpool/2/5/0/250f326bfa69c9da011f809a8b46cea7 > cpool/3/3/8/3382672447e7f9a00ea755cee7ad5187 > cpool/1/0/e/10eec0876f979ca8773f63e697be0adf > cpool/0/e/b/0ebf2a81c863702baa4eb38ec3cef655 > cpool/3/6/c/36c915e781561292d9ae73e127504d0d > cpool/b/5/0/b50dcb17dac0808c4b5de1a9a3b747af > cpool/8/5/f/85fb8dc29ed1597c3fd0725ff91da279 > cpool/9/0/8/90829abb5879fcbe39c2f55c4211b3c5 > > They are all like that, and they are all files, not directories. > > I thought it could be rsync checking the directories that have those > files in them, but there are only 4300 directories,I'm trying it with $ wc -l /tmp/list 1000000 /tmp/list and currently it's up to: 2198200 files... So again, that's a *huge* amount of wasted time. Why? And why isn't it transferring incrementally? It's rsync 3.0 on both ends. -Robin -- http://singinst.org/ : Our last, best hope for a fantastic future. Lojban (http://www.lojban.org/): The language in which "this parrot is dead" is "ti poi spitaki cu morsi", but "this sentence is false" is "na nei". My personal page: http://www.digitalkingdom.org/rlp/
In <20101210171139.GD27025 at digitalkingdom.org>, on 12/10/10 at 09:11 AM, Robin Lee Powell <rlpowell at digitalkingdom.org> said: Hi,>$ rsync -i -aPv --ignore-existing --files-from=/tmp/list /backups/ >ut00-s00010:/backups/ building file list ... >3937 files to consider>That's not such a big deal, but the list I'm *actually* using has twenty >*million* files in it. At a couple hundred files a second, if it's going >to check 4 times the number of files, that's a *huge* time waste. What's >going on?I'm not quite sure what's going on either. What I recommend is cut your list down to 1 file and use rsync -ii -aPv --ignore-existing --files-from=/tmp/list \ /backups/ ut00-s00010:/backups If this does not answer the question add one more -v. Steven -- ---------------------------------------------------------------------- "Steven Levine" <steve53 at earthlink.net> eCS/Warp/DIY etc. www.scoug.com www.ecomstation.com ----------------------------------------------------------------------
In <20101210171139.GD27025 at digitalkingdom.org>, on 12/10/10 at 09:11 AM, Robin Lee Powell <rlpowell at digitalkingdom.org> said: Hi, After a quick testcase, I can see what's happenning. You have>$ rsync -i -aPv --ignore-existing --files-from=/tmp/list /backups/ >ut00-s00010:/backups/ building file list ... >3937 files to considerand cpool/b/c/5/bc5ea7a79a4824c6729645c66b562e6b Each subirectory counts as a file to consider. Steven -- ---------------------------------------------------------------------- "Steven Levine" <steve53 at earthlink.net> eCS/Warp/DIY etc. www.scoug.com www.ecomstation.com ----------------------------------------------------------------------
On Fri, Dec 10, 2010 at 02:42:45PM -0800, Steven Levine wrote:> In <20101210220340.GI27025 at digitalkingdom.org>, on 12/10/10 at > 02:03 PM, Robin Lee Powell <rlpowell at digitalkingdom.org> said: > > Hi, > > >On the other hand, given this: > > >cpool/b/c/5/bc50007d8ab0221cb2b2b61e0754224c > >cpool/b/c/5/bc500094bb43d0f4235363f65658d231 > >cpool/7/7/8/77865de94585b4581f07e54065c7b1e3 > > >, that is, same files, changed order, we have: > > >In the middle case, it checks all the /b/ stuff a second time > >because of *the order of the file*. > > I missed this. My test case was right, but I did not supply > sufficient -v's to see it and I did not have time to look at the > code. The list I saw was after sorting and the duplicates were > gone. > > >Which is fine and appropriate in terms of RAM usage, I guess, but > >very very surprising. > > I don't think the issue is RAM use per se. Reviewing the code, > I'd say it's more likely that no one considered the performance > impact of huge, out of order file lists. I would not consider > your use case standard.Oh, I completely agree that my use case is unusual. The resulting behaviour is still a shock, though.> >I call this a feature, but a documentation fail. At least, I > >can't find anything in the docs that mentions "and you'd better > >sort the file or you won't like the results at all". > > I recommend you put in an documentation enhancement ticket for > this.Doing that now. -Robin -- http://singinst.org/ : Our last, best hope for a fantastic future. Lojban (http://www.lojban.org/): The language in which "this parrot is dead" is "ti poi spitaki cu morsi", but "this sentence is false" is "na nei". My personal page: http://www.digitalkingdom.org/rlp/