It guess I still haven't figured out the entire sematics of the --include and --exclude options. From reading the man page, it seems to say that what happens is that each file being checked is tested against each pattern in order, and when one matches the tests end, and whether it is --include or --exclude determines if that file is included or excluded. So I have on my server a big file tree. I want to use rsync to download only the PDF files, which make up a small portion of that tree. So I try it this way: rsync -aHPvz --include '*.pdf' --exclude '**' user@host:<source> <destination> which gives me nothing. For reference, I try: rsync -aHPvz --include '*.pdf' user@host:<source> <destination> which starts downloading other files. That confirms that the default final action is equivalent to --include '**' or something like that. So it seems the include pattern isn't matching. So I try variations: rsync -aHPvz --include '*.pdf' --exclude '*' user@host:<source> <destination> rsync -aHPvz --include '**.pdf' --exclude '**' user@host:<source> <destination> rsync -aHPvz --include '**.pdf' --exclude '*' user@host:<source> <destination> rsync -aHPvz --include '**/*.pdf' --exclude '**' user@host:<source> <destination> rsync -aHPvz --include '**/*.pdf' --exclude '*' user@host:<source> <destination> rsync -aHPvz --include '/**.pdf' --exclude '**' user@host:<source> <destination> rsync -aHPvz --include '/**.pdf' --exclude '*' user@host:<source> <destination> rsync -aHPvz --include '/**/*.pdf' --exclude '**' user@host:<source> <destination> rsync -aHPvz --include '/**/*.pdf' --exclude '*' user@host:<source> <destination> None of these work. So finally, I replicate the file tree on the server with: cp -al <source> <alternatename> And proceed to remove all non-PDF files: find <alternatename> -type f ! -name '*.pdf' -exec rm -f {} ';' Then I do: rsync -aHPvz --include '*.pdf' --exclude '**' user@host:<alternatename> <destination> which now works. Can rsync do this by itself? Is there a way to tell rsync "only download this particular extension"? How SHOULD I have done this? I generally understand things best by knowing what sequence of steps is performed. I thought I understood this for rsync based on what the man page said. I guess one of us is wrong. I'm running: rsync version 2.6.0 protocol version 27 -- ----------------------------------------------------------------------------- | Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ | | (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ | -----------------------------------------------------------------------------
On Fri, 26 Mar 2004, Phil Howard <phil-rsync-2@ipal.net> wrote:> > So I have on my server a big file tree. I want to use rsync to download > only the PDF files, which make up a small portion of that tree. So I try > it this way: > > rsync -aHPvz --include '*.pdf' --exclude '**' user@host:<source> <destination> > > which gives me nothing.Hi Phil, Your goal seems to be to create a tree that consists only of *.pdf files and their necessary directory nodes. Right? The closest you can get to that with standard includes/excludes is with this combination: --include '*/' (process/create all directory nodes) --include '*.pdf' (include all *.pdf files) --exclude '*' (exclude everything else) The disadvantage is that you will get the entire tree, even the branches with no pdf files. rsync 2.6.0 has a new 'files-from' option which will do what you need, but you have to create the file list yourself (easy enough to do with the 'find' command). But you are at a disadvantage in that you are doing a pull and may not have access to the source. Here's a possibility. Do a --dry-run first to determine the names of the pdf files. Grep the -v output of rsync for '\.pdf$', store that list in a file, then do a real run with the files-from option. I didn't try this - so I haven't verified that this will work. -- John Van Essen Univ of MN Alumnus <vanes002@umn.edu>