Hi, I use rsync 3.0.4 on two opensuse 11 machines. Every night, a big tree of machine A is synced to machine B. These machines are samba PDC and BDC, users and groups are ldap-based, and ACLs are heavily used. There are about 2.8 million files and dirs, 2.2 terabytes of data, and a complete ACL list produced with getfacl has some 600 megabytes. I've just noticed that ACLs get partially corrupted on the receiving side. This is my command line, running as root: /usr/bin/rsync --quiet --links --numeric-ids --acls --perms --times --recursive --owner --group --delete-during --ignore-errors --backup --backup-dir=/wzb/backup/wzb= --password-file=/etc/wzb/rsync/password --filter '- /backup/' rsync://selene.wzb.eu/wzb /wzb /wzb has subtrees user, group, software. Wrong ACLs show up in several places. /wzb/user (home directories) seems ok. Both /wzb/group and /wzb/software have wrong subentries. In /wzb/software, all subentry ACLs, both files and dirs, a partially (but not completely) wrong, and all in the same way. Example: directory /wzb/software/aida This is as should be: # file: aida # owner: root # group: root user::rwx group::r-x group:users:r-x mask::rwx other::--- default:user::rwx default:group::r-x default:group:users:r-x default:mask::rwx default:other::--- This is what happens: # file: aida # owner: root # group: root user::rwx user:spura:rwx group::--- mask::rwx other::--- default:user::rwx default:group::r-x default:group:users:r-x default:mask::rwx default:other::--- It looks like the lines group::r-x group:users:r-x are replaced by user:spura:rwx group::--- The latter is a (partial) ACL from /wzb/user/spura, the "spura" person's home dir. The very same thing happens to all files and dirs below /wzb/software. User "spura" is completely wrong here. Further observations: - At the next rsync run, same thing occurs but with a different(!) user. - The number of ACL entries remains unchanged. - Only existing users show up in the wrong ACLs. - rsync happens to backup the /wzb/user and /wzb/group subtrees first, making it understandable that the wrong ACLs parts are somewhere in rsync's memory. - If I rsync the /wzb/software branch alone, everything is ok. - It does not matter whether or not I use --numeric-ids. Peter Rindfuss
On Mon, Dec 15, 2008 at 12:20:23PM +0100, Peter Rindfuss wrote:> The latter is a (partial) ACL from /wzb/user/spura, the "spura" > person's home dir.The ACL code uses a array to hold the various ACLs it finds. The receiver builds a list of the items that the sender sends to it, and is told index values for matching items from the sender's list. What I imagine is happening for you is that the sender and receiver are getting out of sync in their lists, so that the index in the receivers list is not the same as the sender's list. What we'd need to do is to compare the arrays two arrays and see where they differ, and then figure out where things went wrong. If you'd like me to debug it, one thing you could do is to make a copy of the files in the transfer, get all the ACLs right, truncate all the files to 0 length in the copy, tar them up (assuming your tar will preserve the ACLs), and send me a copy. If you'd care to debug it, I'd suggest starting with some debug output in the send_rsync_acl() and recv_rsync_acl() routines to indicate what index number each side believes relates to a literal ACL that is being transmitted. e.g.: fprintf(stderr, "ndx: %d, user: %d, group: %d, mask: %d, other: %d, name-cnt: %d\n", racl_list->count-1, racl->user_obj, racl->group_obj, racl->group_obj, racl->mask_obj, racl->other_obj, racl->names.count); One will need "duo_item->racl." instead of "racl->", though. ..wayne..
On Mon, Dec 15, 2008 at 12:20:23PM +0100, Peter Rindfuss wrote:> I've just noticed that ACLs get partially corrupted on the receiving side. > This is my command line, running as root: > /usr/bin/rsync --quiet --links --numeric-ids --acls --perms --times > --recursive --owner --group --delete-during --ignore-errors --backup > --backup-dir=/wzb/backup/wzb= --password-file=/etc/wzb/rsync/password > --filter '- /backup/' rsync://selene.wzb.eu/wzb /wzbThis was caused by the backup option putting extra entries into the ACL cache, which caused the cache to get out of sync with the sender's numbering scheme. It could also affect xattrs. I've just checked-in a fix that took care of the issue in a test setup where I replicated the problem. The fix will go out in 3.0.6. Thanks for the report! ..wayne..