Box, Wallace
2007-Sep-26 23:25 UTC
Copies and deletions from a single list of files / directories?
Hi rsync folks - I've got an interesting scenario that I need help with. I'd like to files and directories between a very large source and target - about 350,000 files in the directory tree. When I run rsync between the two to figure out the differences, it takes about an hour. However, I do have available on the source server an exact, simple list of what should be copied and also what should be deleted, and if I could control the rsync processing with this file, I should be able to cut down the process to just a few minutes. But there's no distinction between the two in the list. It's just a list of file and directory names - some to be synced to the target, and some to be deleted from the target. It's almost the same information that ultimately be generated from using the 'batch mode' rsync option. Here's what I'd ideally like to happen. I'd pass the list to rsync, using --files-from. Then, for each entry in the list, perform the following: 1) if the entry does not exist on the source, delete it from the target. This would apply to either a directory or a file. 2) if the entry exists on the source, sync it to the target. If it's a file, just copy the file over. If it's a directory, sync the directory so that the target's directory matches the source directory. Is this possible? The closest we've come is to: 1) take the list and prepend / before each entry, 2) duplicate the list, 3) append /** onto each entry in the duplicated list, 4) concatenate the 2 lists together, 5) use this list as the input to --include-from, also using the -r -exclude=* on the command line. I'm just wondering if there's a cleaner approach. Ideas? Many thanks, Wally -------------- next part -------------- HTML attachment scrubbed and removed
Matt McCutchen
2007-Sep-27 13:01 UTC
Copies and deletions from a single list of files / directories?
On 9/26/07, Box, Wallace <Wallace.Box@nike.com> wrote:> However, I do have available on the source server an exact, simple list of > what should be copied and also what should be deleted, and if I could > control the rsync processing with this file, I should be able to cut down > the process to just a few minutes. > > But there's no distinction between the two in the list. It's just a list of > file and directory names - some to be synced to the target, and some to be > deleted from the target. It's almost the same information that ultimately > be generated from using the 'batch mode' rsync option. > > Here's what I'd ideally like to happen. I'd pass the list to rsync, using > --files-from. Then, for each entry in the list, perform the following: > 1) if the entry does not exist on the source, delete it from the target. > This would apply to either a directory or a file. > 2) if the entry exists on the source, sync it to the target. If it's a > file, just copy the file over. If it's a directory, sync the directory so > that the target's directory matches the source directory. > > Is this possible?No, rsync does not currently support requests to consider individual files for deletion because it views deletion of a file as part of the processing of its nearest ancestor that still exists on the source. Support for such requests would be a nice addition; a way of expressing the absence of a file in the file list would need to be decided upon. In the meantime, your best bet is to process the copies and deletions separately. Start with a little script that splits the change list into copy and deletion lists. Something like this would do: #!/bin/bash while IFS='' read fname; do if [ -e "$fname" ]; then echo "$fname" else echo "$fname" >&3 fi done <changes.list >copies.list 3>deletions.list Then run the copies using "rsync --files-from". For the deletions, the easiest thing would be to copy the list to the target and give it to "xargs rm" or similar there. Or, if the only access you have to the target is through rsync, you could make a new list in which each file is replaced by its nearest ancestor directory that still exists on the source and then give the list to "rsync --files-from --delete" to have it consider the files in each directory for deletion. Matt