thr3ads.net - rsync - --include vs. --exclude [Mar 2004]

If this information is useful, please help other people find it:
Share via:

Phil Howard

2004-Mar-27 03:22 UTC

--include vs. --exclude

It guess I still haven't figured out the entire sematics of the --include
and --exclude options.  From reading the man page, it seems to say that
what happens is that each file being checked is tested against each pattern
in order, and when one matches the tests end, and whether it is --include
or --exclude determines if that file is included or excluded.

So I have on my server a big file tree.  I want to use rsync to download
only the PDF files, which make up a small portion of that tree.  So I try
it this way:

  rsync -aHPvz --include '*.pdf' --exclude '**'
user@host:<source> <destination>

which gives me nothing.  For reference, I try:

  rsync -aHPvz --include '*.pdf' user@host:<source>
<destination>

which starts downloading other files.  That confirms that the default final
action is equivalent to --include '**' or something like that.

So it seems the include pattern isn't matching.  So I try variations:

  rsync -aHPvz --include '*.pdf' --exclude '*'
user@host:<source> <destination>
  rsync -aHPvz --include '**.pdf' --exclude '**'
user@host:<source> <destination>
  rsync -aHPvz --include '**.pdf' --exclude '*'
user@host:<source> <destination>
  rsync -aHPvz --include '**/*.pdf' --exclude '**'
user@host:<source> <destination>
  rsync -aHPvz --include '**/*.pdf' --exclude '*'
user@host:<source> <destination>
  rsync -aHPvz --include '/**.pdf' --exclude '**'
user@host:<source> <destination>
  rsync -aHPvz --include '/**.pdf' --exclude '*'
user@host:<source> <destination>
  rsync -aHPvz --include '/**/*.pdf' --exclude '**'
user@host:<source> <destination>
  rsync -aHPvz --include '/**/*.pdf' --exclude '*'
user@host:<source> <destination>

None of these work.

So finally, I replicate the file tree on the server with:

  cp -al <source> <alternatename>

And proceed to remove all non-PDF files:

  find <alternatename> -type f ! -name '*.pdf' -exec rm -f {}
';'

Then I do:

  rsync -aHPvz --include '*.pdf' --exclude '**'
user@host:<alternatename> <destination>

which now works.

Can rsync do this by itself?  Is there a way to tell rsync "only download
this particular extension"?  How SHOULD I have done this?

I generally understand things best by knowing what sequence of steps is
performed.  I thought I understood this for rsync based on what the man
page said.  I guess one of us is wrong.

I'm running:  rsync  version 2.6.0  protocol version 27

-- 
-----------------------------------------------------------------------------
| Phil Howard KA9WGN       | http://linuxhomepage.com/      http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-----------------------------------------------------------------------------

John Van Essen

2004-Mar-29 01:16 UTC

head link

--include vs. --exclude

On Fri, 26 Mar 2004, Phil Howard <phil-rsync-2@ipal.net>
wrote:>
> So I have on my server a big file tree.  I want to use rsync to download
> only the PDF files, which make up a small portion of that tree.  So I try
> it this way:
> 
>   rsync -aHPvz --include '*.pdf' --exclude '**'
user@host:<source> <destination>
> 
> which gives me nothing.
Hi Phil,

Your goal seems to be to create a tree that consists only of *.pdf
files and their necessary directory nodes.  Right?

The closest you can get to that with standard includes/excludes is
with this combination:

  --include '*/'     (process/create all directory nodes)
  --include '*.pdf'  (include all *.pdf files)
  --exclude '*'      (exclude everything else)

The disadvantage is that you will get the entire tree, even the
branches with no pdf files.

rsync 2.6.0 has a new 'files-from' option which will do what you need,
but you have to create the file list yourself (easy enough to do with
the 'find' command).  But you are at a disadvantage in that you are
doing a pull and may not have access to the source.

Here's a possibility.  Do a --dry-run first to determine the names
of the pdf files.  Grep the -v output of rsync for '\.pdf$', store
that list in a file, then do a real run with the files-from option.

I didn't try this - so I haven't verified that this will work.
-- 
        John Van Essen  Univ of MN Alumnus  <vanes002@umn.edu>

Possibly Parallel Threads

Search for more apparently analagous threads

rsync - Mar 2004 - --include vs. --exclude

--include vs. --exclude

--include vs. --exclude

Possibly Parallel Threads