As mentioned on the rsync home page, the --files-from=FILE option in rsync version 2.6.0 is a useful option that allows one to "specify a list of files to transfer, and can be much more efficient than a recursive descent using include/exclude statements (if you know in advance what files you want to transfer)". However, --files-from does not help one implement the --rsync-exclude=FILE option previously submitted to this list (see the up-to-date patch below). By definition this requires a recursive descent to determine the file list, so it cannot be readily implemented with a wrapper. It requires direct interaction with rsync's hierarchical exclude/include mechanism. The following patch ports the rsync-exclude patch to rsync 2.6.1pre-1 and also fixes a bug that was introduced in 2.6.0 exclude/include option that prevents included patterns in one list from overriding previously excluded patterns from another. This bug becomes apparent on noting that the 0 return code from check_exclude in the include case is now simply ignored in check_exclude_file (rather than preventing lists with lower precedence to be examined, as was the case in earlier versions): ... if (exclude_list && check_exclude(exclude_list, fname, is_dir)) return 1; if (local_exclude_list && check_exclude(local_exclude_list, fname, is_dir)) return 1; ... If you look at the equivalent section of code in 2.5.7, the behaviour is different (in the case of an included pattern, local_exclude list is not examined): if (exclude_list) { for (n=0; exclude_list[n]; n++) { ent = exclude_list[n]; if (check_one_exclude(name, ent, st)) { report_exclude_result(name, ent, st); return !ent->include; } } } if (local_exclude_list) { for (n=0; local_exclude_list[n]; n++) { ent = local_exclude_list[n]; if (check_one_exclude(name, ent, st)) { report_exclude_result(name, ent, st); return !ent->include; } } } To further complicate things, in both versions the relative search order is also not what one would expect (the same applies to the search order within each list in check_exclude; an entry at the end of the list should override a previous entry). It seems reasonable that if there are multiple matching patterns, the most local and most recent matching pattern will be used, in this order: --cvs-exclude, --exclude. As a result of the above issues, an include pattern in local_exclude_list (cvs-exclude) will not override a global exclude pattern in exclude_list, contrary to what one would expect. The patch below fixes all of these problems and adds the very flexible --rsync-exclude=FILE feature (useful for rsync-based backups; e.g. see www.math.ualberta.ca/imaging/rlbackup). I would very much appreciate it if this patch were incorporated into the next release of rsync to fix the unexpected behaviour described above and also to reduce the amount of post-release maintenance required to port --rsync-exclude to new releases. (Feel free to rename it as --recursive-exclude or some such thing). -- John Bowman University of Alberta This is a patch to add an --rsync-exclude=FILE option to rsync-2.6.1pre-1. In any given directory, patterns listed in FILE can be used to recursively exclude/include files in that directory and all of its descendants. Prefixing a file name with "+ " will force inclusion of the file. If there are multiple matching patterns, the most local and most recent matching pattern will be used, in this order: --rsync-exclude, --cvs-exclude, --exclude. rsync --rsync-exclude=.rsync -vaxH /here /there will copy all files from /here to /there, excluding any files listed in a file .rsync in the directory containing this file and all of its subdirectories. This feature has advantages over --cvs-exclude for backing up large file systems since the .cvsignore files only apply to the current directory: unless the .cvsignore restrictions apply to the entire tree they must be duplicated in each subdirectory. Furthemore, the --cvs-exclude option is not intended for general system backups (for example, unless the default list is cleared with "!", it automatically excludes *.a and *.so libraries). diff -ru rsync-2.6.1pre-1/exclude.c rsync-2.6.1pre-1J/exclude.c --- rsync-2.6.1pre-1/exclude.c 2004-02-23 20:23:53.000000000 +0100 +++ rsync-2.6.1pre-1J/exclude.c 2004-04-08 10:41:54.000000000 +0200 @@ -197,35 +197,39 @@ static void report_exclude_result(char const *name, struct exclude_struct const *ent, - int name_is_dir) + int name_is_dir, const char *type) { /* If a trailing slash is present to match only directories, * then it is stripped out by make_exclude. So as a special * case we add it back in here. */ if (verbose >= 2) { - rprintf(FINFO, "[%s] %s %s %s because of pattern %s%s\n", + rprintf(FINFO, "[%s] %s %s %s because of %s pattern %s%s\n", who_am_i(), ent->include ? "including" : "excluding", name_is_dir ? "directory" : "file", - name, ent->pattern, + name, type, ent->pattern, ent->directory ? "/" : ""); } } /* - * Return true if file NAME is defined to be excluded by either - * LOCAL_EXCLUDE_LIST or the globals EXCLUDE_LIST. + * Return -1 (+1) if file NAME is defined to be excluded (included), according + * to the most recent matching pattern in list. Otherwise return 0; */ -int check_exclude(struct exclude_struct **list, char *name, int name_is_dir) +int check_exclude(struct exclude_struct **list, char *name, int name_is_dir, + const char *type) { struct exclude_struct *ent; - while ((ent = *list++) != NULL) { + int n; + for (n=0; list[n]; n++) ; + for (n--; n >= 0; n--) { + ent = list[n]; if (check_one_exclude(name, ent, name_is_dir)) { - report_exclude_result(name, ent, name_is_dir); - return !ent->include; + report_exclude_result(name, ent, name_is_dir, type); + return (ent->include ? 1 : -1); } } diff -ru rsync-2.6.1pre-1/flist.c rsync-2.6.1pre-1J/flist.c --- rsync-2.6.1pre-1/flist.c 2004-02-11 03:48:58.000000000 +0100 +++ rsync-2.6.1pre-1J/flist.c 2004-04-08 10:50:46.000000000 +0200 @@ -39,6 +39,7 @@ extern int numeric_ids; extern int cvs_exclude; +extern const char *rsync_exclude; extern int recurse; extern char curr_dir[MAXPATHLEN]; @@ -66,6 +67,7 @@ extern struct exclude_struct **exclude_list; extern struct exclude_struct **server_exclude_list; extern struct exclude_struct **local_exclude_list; +static struct exclude_struct **recur_local_exclude_list; int io_error; @@ -210,6 +212,7 @@ */ static int check_exclude_file(char *fname, int is_dir, int exclude_level) { + int rc; #if 0 /* This currently never happens, so avoid a useless compare. */ if (exclude_level == NO_EXCLUDES) return 0; @@ -225,16 +228,24 @@ return 0; } } - if (server_exclude_list - && check_exclude(server_exclude_list, fname, is_dir)) - return 1; + /* Precedence: use the most local and most recent matching pattern, + in this order: server, --rsync-exclude, --cvs-exclude, --exclude */ + if (server_exclude_list && + (rc=check_exclude(server_exclude_list, fname, is_dir, "server"))) + return (rc < 0); if (exclude_level != ALL_EXCLUDES) return 0; - if (exclude_list && check_exclude(exclude_list, fname, is_dir)) - return 1; - if (local_exclude_list - && check_exclude(local_exclude_list, fname, is_dir)) - return 1; + if (recur_local_exclude_list && + (rc=check_exclude(recur_local_exclude_list, fname, is_dir, + "rsync-exclude"))) + return (rc < 0); + if (local_exclude_list && + (rc=check_exclude(local_exclude_list, fname, is_dir, + "cvs-exclude"))) + return (rc < 0); + if (exclude_list && + (rc=check_exclude(exclude_list, fname, is_dir, "exclude"))) + return (rc < 0); return 0; } @@ -503,7 +514,32 @@ io_write_phase = "unknown"; } +static struct exclude_struct **copy_exclude_list(struct exclude_struct **from) { + struct exclude_struct **to; + int i; + int len=0; + int size; + + if (!from) return NULL; + + for (; from[len]; len++) ; + size=sizeof(struct exclude_struct *)*(len+1); + to = (struct exclude_struct **) malloc(size); + if (!to) out_of_memory("copy_exclude_list"); + + size=sizeof(struct exclude_struct); + for (i=0; i < len; i++) { + struct exclude_struct *p; + p=to[i]=(struct exclude_struct *) malloc(size); + if (!p) out_of_memory("copy_exclude_list"); + *p=*from[i]; + p->pattern=strdup(from[i]->pattern); + if (!p->pattern) out_of_memory("copy_exclude_list"); + } + to[len]=NULL; + return to; +} void receive_file_entry(struct file_struct **fptr, unsigned short flags, struct file_list *flist, int f) @@ -925,8 +961,11 @@ if (recursive && S_ISDIR(file->mode) && !(file->flags & FLAG_MOUNT_POINT)) { struct exclude_struct **last_exclude_list = local_exclude_list; + struct exclude_struct **recur_last_exclude_list + recur_local_exclude_list; send_directory(f, flist, f_name_to(file, fbuf)); local_exclude_list = last_exclude_list; + recur_local_exclude_list = recur_last_exclude_list; return; } } @@ -963,6 +1002,7 @@ } local_exclude_list = NULL; + recur_local_exclude_list = copy_exclude_list(recur_local_exclude_list); if (cvs_exclude) { if (strlcpy(p, ".cvsignore", MAXPATHLEN - offset) @@ -976,6 +1016,18 @@ } } + if (rsync_exclude) { + if (strlen(fname) + strlen(rsync_exclude) <= MAXPATHLEN - 1) { + strcpy(p, rsync_exclude); + add_exclude_file(&recur_local_exclude_list,fname,MISSING_OK,ADD_EXCLUDE); + } else { + io_error = 1; + rprintf(FINFO, + "cannot rsync-exclude in long-named directory %s\n", + fname); + } + } + for (errno = 0, di = readdir(d); di; errno = 0, di = readdir(d)) { char *dname = d_name(di); if (dname[0] == '.' && (dname[1] == '\0' @@ -999,6 +1051,10 @@ if (local_exclude_list) free_exclude_list(&local_exclude_list); /* Zeros pointer too */ + if (recur_local_exclude_list) { + free_exclude_list(&recur_local_exclude_list); + } + closedir(d); } @@ -1022,6 +1078,8 @@ if (show_filelist_p() && f != -1) start_filelist_progress("building file list"); + recur_local_exclude_list = NULL; + start_write = stats.total_written; flist = flist_new(f == -1 ? WITHOUT_HLINK : WITH_HLINK, diff -ru rsync-2.6.1pre-1/options.c rsync-2.6.1pre-1J/options.c --- rsync-2.6.1pre-1/options.c 2004-02-22 09:56:43.000000000 +0100 +++ rsync-2.6.1pre-1J/options.c 2004-04-08 10:11:13.000000000 +0200 @@ -47,6 +47,7 @@ int update_only = 0; int cvs_exclude = 0; int dry_run = 0; +const char *rsync_exclude = NULL; int local_server = 0; int ignore_times = 0; int delete_mode = 0; @@ -267,6 +268,7 @@ rprintf(F," -P equivalent to --partial --progress\n"); rprintf(F," -z, --compress compress file data\n"); rprintf(F," -C, --cvs-exclude auto ignore files in the same way CVS does\n"); + rprintf(F," --rsync-exclude=FILE recursively exclude patterns locally listed in FILE\n"); rprintf(F," --exclude=PATTERN exclude files matching PATTERN\n"); rprintf(F," --exclude-from=FILE exclude patterns listed in FILE\n"); rprintf(F," --include=PATTERN don't exclude files matching PATTERN\n"); @@ -333,6 +335,7 @@ {"dry-run", 'n', POPT_ARG_NONE, &dry_run, 0, 0, 0 }, {"sparse", 'S', POPT_ARG_NONE, &sparse_files, 0, 0, 0 }, {"cvs-exclude", 'C', POPT_ARG_NONE, &cvs_exclude, 0, 0, 0 }, + {"rsync-exclude", 0, POPT_ARG_STRING, &rsync_exclude, 0, 0, 0 }, {"update", 'u', POPT_ARG_NONE, &update_only, 0, 0, 0 }, {"links", 'l', POPT_ARG_NONE, &preserve_links, 0, 0, 0 }, {"copy-links", 'L', POPT_ARG_NONE, ©_links, 0, 0, 0 }, diff -ru rsync-2.6.1pre-1/proto.h rsync-2.6.1pre-1J/proto.h --- rsync-2.6.1pre-1/proto.h 2004-02-18 00:13:06.000000000 +0100 +++ rsync-2.6.1pre-1J/proto.h 2004-04-07 11:42:21.000000000 +0200 @@ -52,7 +52,8 @@ void setup_protocol(int f_out,int f_in); int claim_connection(char *fname,int max_connections); void free_exclude_list(struct exclude_struct ***listp); -int check_exclude(struct exclude_struct **list, char *name, int name_is_dir); +int check_exclude(struct exclude_struct **list, char *name, int name_is_dir, + const char *type); void add_exclude(struct exclude_struct ***listp, const char *pattern, int include); void add_exclude_file(struct exclude_struct ***listp, const char *fname, int fatal, int include); diff -ru rsync-2.6.1pre-1/rsync.yo rsync-2.6.1pre-1J/rsync.yo --- rsync-2.6.1pre-1/rsync.yo 2004-03-24 22:58:50.000000000 +0100 +++ rsync-2.6.1pre-1J/rsync.yo 2004-04-07 11:42:21.000000000 +0200 @@ -327,6 +327,7 @@ -P equivalent to --partial --progress -z, --compress compress file data -C, --cvs-exclude auto ignore files in the same way CVS does + --rsync-exclude=FILE recursively exclude patterns locally listed in FILE --exclude=PATTERN exclude files matching PATTERN --exclude-from=FILE exclude patterns listed in FILE --include=PATTERN don't exclude files matching PATTERN @@ -645,6 +646,13 @@ .cvsignore file and matches one of the patterns listed therein. See the bf(cvs(1)) manual for more information. +dit(bf(--rsync-exclude=FILE)) In any given directory, patterns listed in +FILE are excluded from the file lists associated with that directory +and all of its descendants. Prefixing the file name with "+ " will force +inclusion of the file. If there are multiple matching patterns, the most +local and most recent matching pattern will be used, in this order: +--rsync-exclude, --cvs-exclude, --exclude. + dit(bf(--exclude=PATTERN)) This option allows you to selectively exclude certain files from the list of files to be transferred. This is most useful in combination with a recursive transfer. diff -ru rsync-2.6.1pre-1/util.c rsync-2.6.1pre-1J/util.c --- rsync-2.6.1pre-1/util.c 2004-02-18 00:13:10.000000000 +0100 +++ rsync-2.6.1pre-1J/util.c 2004-04-07 11:42:21.000000000 +0200 @@ -476,7 +476,7 @@ if (server_exclude_list) { for (s = arg; (s = strchr(s, '/')) != NULL; ) { *s = '\0'; - if (check_exclude(server_exclude_list, arg, 1)) { + if (check_exclude(server_exclude_list, arg, 1, "server")) { /* We must leave arg truncated! */ return 1; }
On Fri, Apr 09, 2004 at 06:33:58PM +0200, John Bowman wrote:> and also fixes a bug that was introduced in 2.6.0 exclude/include > option that prevents included patterns in one list from overriding > previously excluded patterns from another.This isn't a bug -- it was done as part of the exclude cleanup to fix some potential mishaps. For instance, if a server has specified these include/exclude rules: + /foo.c - /*.c it was done that way to say "I'll let you copy foo.c, but no other .c files." Not to say, "You must take foo.c whether you want it or not." Thus, an inclusion match in a list just means that the name avoids being excluded by that particular list. Yes, this means that if any of the 3 exclude lists tells rsync to exclude a file that it gets excluded. One can argue both for and against this when thinking about how the normal excludes interact with the CVS excludes. Consider: rsync -C --include=*/ --include=*.c --exclude=* ... If a .cvsignore file specifies "foo.c", I want it to be excluded (because it's a generated .c file). However, I could argue that I should also be able to override the .cvsignore-excluded file somehow, perhaps with this: --include=/path/foo.c The problem is distinguishing if the user is specifying an exception to a rule in the current list or trying to override an exclusion from the CVS-ignore list. I can't think of a good heuristic to accomplish this at the moment. If someone has a bright idea, let me know. Failing that, I think the behavior in 2.6.0 (and 2.6.1pre-1) is what we want.> an entry at the end of the list should override a previous entryMost definitely NOT. The documented behavior has always been "first match wins." To change this to "last match wins" would be incompatible with prior rsync versions.> As a result of the above issues, an include pattern in local_exclude_list > (cvs-exclude) will not override a global exclude pattern in exclude_list,There are no include patterns in local_exclude_list at the moment, so this isn't a problem. If there were, we would be in an identical conundrum to the one stated above (though with the lists reversed).> I would very much appreciate it if this patch were incorporated into the > next release of rsyncI'll massage it a bit and put it in the patches dir for now. We'll consider it for the release after 2.6.1 (which is going to be more of a feature-adding release than this one). ..wayne..
On Sat, Apr 10, 2004 at 08:56:16PM +0200, John Bowman wrote:> It's a question of precedence. Feel free to change the precedence of > the "server" excludes.That doesn't fix anything, it just switches around which lists have the problem. Also, server excludes MUST be obeyed because we want the module to be able to specify files that the user never, ever gets, no matter what. [I wrote:]> > Yet you don't supply a solution for the conundrum I posed.> You can solve it on a directory-by-directory basis with rsync-exclude.You mean by manually editing per-directory files instead of specifying commandline options? No thank you. And not even possible if you're talking about grabbing files from a remote server where you don't have permissions to log in and tweak the files.> But it gives the user no control over his backups. He is at the mercy > of what the system operator types into the global exclude file.What global excludes are you talking about? The ones specified by a server daemon's config file? Those must be obeyed. The ones in the $CVSIGNORE file? Yes, using -C does put you at the mercy of all the cvsignore rules for the source hierarchy. Currently, to get a CVS- excluded file you have to turn off -C and ask for the file separately. I agree that it would be nice to make that easier somehow, but I would want it to be changed in a way that lets the command take full control (without editing any per-directory files on the source system). ..wayne..
Wayne Davison
2004-Apr-17 20:18 UTC
subdir-exclude patch now in patches dir (was rsync-exclude)
On Sat, Apr 17, 2004 at 08:45:15PM +0200, John Bowman wrote:> I tried out the subdir-exclude patch and it works well for me. The > next release of rlbackup (coming in a few days) will use this version.Note that I renamed the option in the patch that comes along with the 2.6.1pre-2 release; it is now: --perdir-exclude-from=FILE . I wanted the "-from" suffix so that it was associated both with the other *-from options and the --from0 option (which does affect the line-endings when reading these per-directory exclude files). The "perdir" prefix is the best I could come up with that described this exclude option as briefly as possible (since the option name is rather long in total). If you have a better idea, let me know. ..wayne..