Wayne,
Please consider the attached patch. This applies to the current
CVS, and is independant of patches/local-batch.diff. As a matter of
fact, I'm sure it would conflict heavily with local-batch.diff.
This version of batch mode has a couple distinguishing features:
Write-batch records (almost) the entire sender side of the conversation
into one file. ("Almost" because it has to smooth out differences
between server-sender and non-server-sender.) In theory, it should now
work with many of the other rsync options (like compression), even ones
that modify the protocol, as long as they act equally on server and
non-server.
The motivation of this patch is to significantly reduce the
impact of batch-mode code on the rest of the codebase, and to allow
batch-mode to continue to "Just Work" even as the rest of the code
evolves. Before, batch code was basically sprinkled in everywhere there
was significant i/o with the appropriate "if ({read|write}_batch)
batch_function_to_{read|write}_entire_datastructures()". Now, batch
code is essentially de-coupled from the rest of rsync. Points of
interaction are:
1) opening a batch file in main()
2) a hook in readfd() for writing batches from a receiver
3) a hook in writefd() for writing batches from a sender
4) file descriptor swaping in client_run() for reading batches
5) aborting the unneeded server in local_child() during a read_batch
6) a special-case for writing the protocol version during
an rsyncd inband exchange
7) a special-case for writing the checksum-seed
8) a special-case for not reading end-of-run statistics
Points 1-5 are very simple and clean. Points 6-8 are all the
result of protocol dependance on server-ness. Point 6 is unfortunate
but somewhat understandable. Points 7 and 8 are, IMO, less understandable.
In both cases, I think there are good reasons (unrelated to batch-mode) to
replace dependance on server-ness with dependance on sender-ness. These
are not too hard to fix, but I went to great lengths to make this patch
not require any protocol changes.
I don't know if it's good style, but it was convenient for me to
include several large comments in this patch. These comments describe
the issues related to these 3 special cases, and I would welcome feedback
via email on the comments.
The core batch functionality provided by 1-5 should be robust
against changes to the rest of rsync, _except_ for changes that
introduce new protocol dependance on server-ness. This is because those
3 special cases ensure that the batch file is the same whether there was
a server involved or not.
There are still some issues with the patch as it is. For
example: 1) I suspect one area in client_run() is non-portable. 2) This
leaves hundreds and hundreds of lines of dead-code around. But I wanted
to get some feedback on progress so far.
Future work:
This patch still needs some clean-up, but it demonstrates a very
different design approach than the existing batch-mode and it works for
all the cases I've tested.
If you think this new incarnation of batch-mode will go
main-stream then I will write up the appropriate patch for the man page
as well.
If you are open to some protocol changes with the motivation of
unifying the client/server protocol with the sender/receiver protocol,
then I can write something up.
After batch-mode cleanup, I'd like to tackle some performance
issues. But, I've also seen a few other functions that look like they
just need some clean-up.
Let me know what you think.
-chris
-------------- next part --------------
Index: batch.c
==================================================================RCS file:
/cvsroot/rsync/batch.c,v
retrieving revision 1.32
diff -c -b -d -r1.32 batch.c
*** batch.c 15 May 2004 19:31:10 -0000 1.32
--- batch.c 12 Jul 2004 00:37:45 -0000
***************
*** 25,30 ****
--- 25,31 ----
void write_batch_flist_info(int flist_count, struct file_struct **files)
{
+ return;
char filename[MAXPATHLEN];
int i, f, save_pv;
int64 save_written;
***************
*** 180,185 ****
--- 181,187 ----
**/
void write_batch_csum_info(int *flist_entry, struct sum_struct *s)
{
+ return;
size_t i;
int int_count;
char filename[MAXPATHLEN];
***************
*** 270,275 ****
--- 272,278 ----
void write_batch_delta_file(char *buff, int bytes_to_write)
{
+ return;
char filename[MAXPATHLEN];
if (f_delta < 0) {
Index: clientserver.c
==================================================================RCS file:
/cvsroot/rsync/clientserver.c,v
retrieving revision 1.127
diff -c -b -d -r1.127 clientserver.c
*** clientserver.c 13 Jun 2004 14:18:48 -0000 1.127
--- clientserver.c 12 Jul 2004 00:37:46 -0000
***************
*** 50,55 ****
--- 50,57 ----
extern struct exclude_list_struct server_exclude_list;
extern char *exclude_path_prefix;
extern char *config_file;
+ extern int write_batch;
+ extern int batch_fd;
char *auth_user;
***************
*** 97,109 ****
return ret < 0? ret : client_run(fd, fd, -1, argc, argv);
}
! int start_inband_exchange(char *user, char *path, int f_in, int f_out, int
argc)
{
int i;
char *sargs[MAX_ARGS];
int sargc = 0;
char line[MAXPATHLEN];
char *p;
if (argc == 0 && !am_sender)
list_only = 1;
--- 99,121 ----
return ret < 0? ret : client_run(fd, fd, -1, argc, argv);
}
! /* start_inband_exchange() contains an unfortunate write_batch
! * hack/workaround. The issue here is that the protocol for version
! * exchange differs when an rsyncd server is involved. However, the
! * batch file written must be the same whether a server is involved or
! * not. If only version exchange always used the same protocol...
! */
! int start_inband_exchange(char *user, char *path, int f_in, int f_out,
! int argc)
{
int i;
char *sargs[MAX_ARGS];
int sargc = 0;
char line[MAXPATHLEN];
char *p;
+ int write_batch_temp;
+ write_batch_temp = write_batch; /* save incoming mode */
+ write_batch = 0; /* don't write RSYNCD protocol to file */
if (argc == 0 && !am_sender)
list_only = 1;
***************
*** 200,205 ****
--- 212,221 ----
io_start_multiplex_in(f_in);
}
+ if (write_batch_temp) /* fudge the protocol version */
+ write_int(batch_fd, remote_protocol);
+
+ write_batch = write_batch_temp; /* restore incoming mode */
return 0;
}
Index: compat.c
==================================================================RCS file:
/cvsroot/rsync/compat.c,v
retrieving revision 1.22
diff -c -b -d -r1.22 compat.c
*** compat.c 11 May 2004 17:25:01 -0000 1.22
--- compat.c 12 Jul 2004 00:37:46 -0000
***************
*** 28,33 ****
--- 28,36 ----
int remote_protocol = 0;
extern int am_server;
+ extern int am_sender;
+ extern int write_batch;
+ extern int batch_fd;
extern int checksum_seed;
***************
*** 73,83 ****
--- 76,138 ----
exit_cleanup(RERR_PROTOCOL);
}
+ /* CAS: I think this is a good candidate for a protocol
+ * change. Instead of:
+ *
+ * if (am_server) write_int() stuff;
+ * else read_int() stuff;
+ *
+ * it is good to remove the protocol dependence on server-ness:
+ *
+ * if (am_sender) write_int() stuff;
+ * else read_int() stuff;
+ *
+ * This has the effect of halving the number of possible
+ * protocol combinations, because sender and receiver already
+ * have an asymetric protocol, and varying on server-ness
+ * means 2 more possibilities for each side.
+ *
+ * Another reason: the priviledge of choosing the checksum
+ * seed should be given to the party that may have an greater
+ * interest in specifying the actual seed value. The only
+ * reason to be interested in the selection of seed value is
+ * if you have cached checksums. However, the sender has
+ * the greater interest because a file sent can be sent again with
+ * the same checksum values, but a file received has no valid
+ * cached checksums. (Specifically, if the file was modified
+ * then the checksums cached become invalid, and if the file
+ * was new then no checksums are cached.)
+ *
+ * One could argue that the receiver could want to cache
+ * checksums for the case where a receiver-server is
+ * frequently checksumming a file but infrequently updating
+ * it. Surely this is obscure.
+ */
+
+ /*
+ * BUT, as it is, we have to catch every combination, and
+ * compensate accordingly. Thus, below, we have the ugly
+ * write_int(batch_fd, checksum_seed); If this was
+ * if(am_sender)...else... then there'd be no batch stuff here
+ * at all.
+ *
+ */
+
if (am_server) {
if (!checksum_seed)
checksum_seed = time(NULL);
write_int(f_out, checksum_seed);
+ if (!am_sender && write_batch) {
+ write_batch = 0;
+ write_int(batch_fd, checksum_seed);
+ write_batch = 1;
+ }
} else {
checksum_seed = read_int(f_in);
+ if (am_sender && write_batch) {
+ write_batch = 0;
+ write_int(batch_fd, checksum_seed);
+ write_batch = 1;
+ }
}
}
Index: flist.c
==================================================================RCS file:
/cvsroot/rsync/flist.c,v
retrieving revision 1.231
diff -c -b -d -r1.231 flist.c
*** flist.c 18 Jun 2004 16:29:21 -0000 1.231
--- flist.c 12 Jul 2004 00:37:47 -0000
***************
*** 950,958 ****
flist_expand(flist);
- if (write_batch)
- file->flags |= FLAG_TOP_DIR;
-
if (file->basename[0]) {
flist->files[flist->count++] = file;
send_file_entry(file, f, base_flags);
--- 950,955 ----
***************
*** 1301,1313 ****
* protocol version 15 */
recv_uid_list(f, flist);
- if (!read_batch) {
/* Recv the io_error flag */
if (lp_ignore_errors(module_id) || ignore_errors)
read_int(f);
else
io_error |= read_int(f);
! }
}
if (verbose > 3)
--- 1298,1309 ----
* protocol version 15 */
recv_uid_list(f, flist);
/* Recv the io_error flag */
if (lp_ignore_errors(module_id) || ignore_errors)
read_int(f);
else
io_error |= read_int(f);
!
}
if (verbose > 3)
Index: io.c
==================================================================RCS file:
/cvsroot/rsync/io.c,v
retrieving revision 1.131
diff -c -b -d -r1.131 io.c
*** io.c 23 Jun 2004 01:13:09 -0000 1.131
--- io.c 12 Jul 2004 00:37:48 -0000
***************
*** 56,61 ****
--- 56,63 ----
extern int eol_nulls;
extern char *remote_filesfrom_file;
extern struct stats stats;
+ extern int batch_fd;
+ extern int write_batch;
const char phase_unknown[] = "unknown";
int select_timeout = SELECT_TIMEOUT;
***************
*** 674,679 ****
--- 676,688 ----
total += ret;
}
+ if (write_batch && !am_sender) {
+ if (write(batch_fd, buffer, total) < 0) {
+ close(batch_fd);
+ exit_cleanup(RERR_FILEIO);
+ }
+ }
+
stats.total_read += total;
}
***************
*** 951,956 ****
--- 960,972 ----
exit_cleanup(RERR_PROTOCOL);
}
+ if (write_batch && am_sender) {
+ if (write(batch_fd, buf, len) < 0) {
+ close(batch_fd);
+ exit_cleanup(RERR_FILEIO);
+ }
+ }
+
if (!io_buffer || fd != multiplex_out_fd) {
writefd_unbuffered(fd, buf, len);
return;
Index: main.c
==================================================================RCS file:
/cvsroot/rsync/main.c,v
retrieving revision 1.202
diff -c -b -d -r1.202 main.c
*** main.c 30 Jun 2004 07:27:30 -0000 1.202
--- main.c 12 Jul 2004 00:37:49 -0000
***************
*** 58,63 ****
--- 58,65 ----
extern char *rsync_path;
extern char *shell_cmd;
extern struct file_list *batch_flist;
+ extern int batch_fd;
+ extern char *batch_prefix;
/* there's probably never more than at most 2 outstanding child processes,
***************
*** 125,130 ****
--- 127,151 ----
return;
}
+ /* CAS: I think that this is a good candidate for a protocol
+ * change. Instead of making the protocol depend on
+ * am_server, I think this should be a simple:
+ *
+ * if (am_sender)
+ * write_longint stuff;
+ * else
+ * read_longing stuff;
+ *
+ * This would have several advantages: 1) It simplifies the
+ * protocol by removing variation based on server-ness. 2) It
+ * makes the client_run()-if(am_sender) path look more like
+ * do_server_sender (ideally, these should be the same). 3)
+ * It removes the need for write_batch corrections that have
+ * to record a canonical (independent of server-ness) record
+ * of the transmission, and read_batch corrections that have
+ * to replay this canonical record.
+ */
+
if (am_server) {
if (am_sender) {
int64 w;
***************
*** 300,307 ****
}
if (local_server) {
- if (read_batch)
- create_flist_from_batch(); /* sets batch_flist */
ret = local_child(argc, args, f_in, f_out, child_main);
} else {
ret = piped_child(args,f_in,f_out);
--- 321,326 ----
***************
*** 461,466 ****
--- 480,491 ----
recv_files(f_in,flist,local_name);
io_flush(FULL_FLUSH);
+
+ /* This test for read_batch is needed because of a
+ * protocol dependency on am_server state, see
+ * report(). We are quite fortunate that this
+ * workaround is not more complicated. */
+ if (!read_batch)
report(f_in);
send_msg(MSG_DONE, "", 0);
***************
*** 543,551 ****
filesfrom_fd = -1;
}
- if (read_batch)
- flist = batch_flist;
- else
flist = recv_file_list(f_in);
if (!flist) {
rprintf(FERROR,"server_recv: recv_file_list error\n");
--- 568,573 ----
***************
*** 585,590 ****
--- 607,614 ----
if (am_sender) {
keep_dirlinks = 0; /* Must be disabled on the sender. */
+
+ /* TODO: I suspect this limitation can be removed. */
if (!read_batch) {
recv_exclude_list(f_in);
if (cvs_exclude)
***************
*** 609,623 ****
char *local_name = NULL;
cleanup_child_pid = pid;
! if (read_batch)
! flist = batch_flist;
set_nonblocking(f_in);
set_nonblocking(f_out);
setup_protocol(f_out,f_in);
! if (protocol_version >= 23)
io_start_multiplex_in(f_in);
if (am_sender) {
--- 633,658 ----
char *local_name = NULL;
cleanup_child_pid = pid;
! if (read_batch) {
! close(f_in);
! close(f_out);
+ /* This is the heart of the read_batch approach:
+ * Switcher-roo the file descriptors, and
+ * nobody's the wiser. */
+ f_in = batch_fd;
+
+ /* FIXME: How to make this portable? */
+ f_out = do_open("/dev/null", O_WRONLY, 0);
+ am_sender = 0;
+ } else {
set_nonblocking(f_in);
set_nonblocking(f_out);
+ }
setup_protocol(f_out,f_in);
! if (protocol_version >= 23 && !read_batch)
io_start_multiplex_in(f_in);
if (am_sender) {
***************
*** 629,634 ****
--- 664,672 ----
send_exclude_list(f_out);
if (remote_filesfrom_file)
filesfrom_fd = f_in;
+
+ /* Can be unconditional, but this is theoretically
+ * more efficent for read_batch case. */
if (!read_batch) /* don't write to pipe */
flist = send_file_list(f_out,argc,argv);
if (verbose > 3)
***************
*** 637,642 ****
--- 675,682 ----
io_flush(NORMAL_FLUSH);
send_files(flist,f_out,f_in);
io_flush(FULL_FLUSH);
+ /* Wondering why this doesn't look like do_server_sender()?
+ * see report(). */
if (protocol_version >= 24) {
/* final goodbye message */
read_int(f_in);
***************
*** 655,660 ****
--- 695,702 ----
if (argc == 0)
list_only = 1;
+ /* Can be unconditional, but this is theoretically more
+ * efficient for the read_batch case. */
if (!read_batch)
send_exclude_list(f_out);
***************
*** 839,845 ****
}
argc--;
} else { /* read_batch */
- am_sender = 1;
local_server = 1;
shell_path = argv[argc-1];
}
--- 881,886 ----
***************
*** 1037,1044 ****
init_flist();
! if (write_batch && !am_server) {
write_batch_argvs_file(orig_argc, orig_argv);
}
if (am_daemon && !am_server)
--- 1078,1095 ----
init_flist();
! if (write_batch || read_batch) {
! if (write_batch)
write_batch_argvs_file(orig_argc, orig_argv);
+
+ batch_fd = do_open(batch_prefix,
+ write_batch ? O_WRONLY | O_CREAT | O_TRUNC
+ : O_RDONLY, S_IRUSR | S_IWUSR);
+ if (batch_fd < 0) {
+ rsyserr(FERROR, errno, "Batch file %s open error",
+ batch_prefix);
+ exit_cleanup(RERR_FILEIO);
+ }
}
if (am_daemon && !am_server)
Index: options.c
==================================================================RCS file:
/cvsroot/rsync/options.c,v
retrieving revision 1.157
diff -c -b -d -r1.157 options.c
*** options.c 20 Jun 2004 19:47:05 -0000 1.157
--- options.c 12 Jul 2004 00:37:50 -0000
***************
*** 111,116 ****
--- 111,117 ----
int write_batch = 0;
int read_batch = 0;
+ int batch_fd = 0;
int backup_dir_len = 0;
int backup_suffix_len;
unsigned int backup_dir_remainder;
***************
*** 638,643 ****
--- 639,654 ----
}
#endif
+ if ((write_batch || read_batch) && am_server) {
+ rprintf(FERROR,
+ "batch-mode is incompatible with server mode\n");
+ write_batch = 0;
+ read_batch = 0;
+ /* We don't actually exit_cleanup(), so that we can still service
+ * older version clients that still send batch args to server */
+
+ }
+
if (write_batch && read_batch) {
rprintf(FERROR,
"write-batch and read-batch can not be used together\n");
***************
*** 884,895 ****
args[ac++] = arg;
}
! if (batch_prefix) {
! char *r_or_w = write_batch ? "write" : "read";
! if (asprintf(&arg, "--%s-batch=%s", r_or_w, batch_prefix) <
0)
! goto oom;
! args[ac++] = arg;
! }
if (io_timeout) {
if (asprintf(&arg, "--timeout=%d", io_timeout) < 0)
--- 895,901 ----
args[ac++] = arg;
}
! /* batch operations are local only, so don't pass batch args */
if (io_timeout) {
if (asprintf(&arg, "--timeout=%d", io_timeout) < 0)
Index: pipe.c
==================================================================RCS file:
/cvsroot/rsync/pipe.c,v
retrieving revision 1.8
diff -c -b -d -r1.8 pipe.c
*** pipe.c 18 Jun 2004 16:00:33 -0000 1.8
--- pipe.c 12 Jul 2004 00:37:51 -0000
***************
*** 26,31 ****
--- 26,32 ----
extern int blocking_io;
extern int orig_umask;
extern int read_batch;
+ extern int write_batch;
extern int filesfrom_fd;
/**
***************
*** 94,100 ****
return pid;
}
! pid_t local_child(int argc, char **argv,int *f_in,int *f_out,
int (*child_main)(int, char*[]))
{
pid_t pid;
--- 95,113 ----
return pid;
}
! /*
! * This function forks a child which calls child_main(). First,
! * however, it has to establish communication paths to and from the
! * newborn child. It creates two socket pairs -- one for writing to
! * the child (from the parent) and one for reading from the child
! * (writing to the parent). Since that's four socket ends, each
! * process has to close the two ends it doesn't need. The remaining
! * two socket ends are retained for reading and writing. In the
! * child, the STDIN and STDOUT file descriptors refer to these
! * sockets. In the parent, the function arguments f_in and f_out are
! * set to refer to these sockets.
! */
! pid_t local_child(int argc, char **argv, int *f_in, int *f_out,
int (*child_main)(int, char*[]))
{
pid_t pid;
***************
*** 117,122 ****
--- 130,143 ----
am_sender = read_batch ? 0 : !am_sender;
am_server = 1;
+ /* Either sender or receiver can write the file for
+ * write_batch mode. In local_child, both are local
+ * processes, so we must make sure that only one
+ * actually writes. It shouldn't matter which one,
+ * because they must produce the same file.
+ */
+ write_batch = 0;
+
if (!am_sender)
filesfrom_fd = -1;
***************
*** 129,134 ****
--- 150,156 ----
}
if (to_child_pipe[0] != STDIN_FILENO) close(to_child_pipe[0]);
if (from_child_pipe[1] != STDOUT_FILENO) close(from_child_pipe[1]);
+ if (read_batch) exit_cleanup(0); /* no reason to continue */
child_main(argc, argv);
}