Greetings, I'd like to propose a new option to rsync, which causes it to read device files as if they were regular files. This includes pipes, character devices and block devices (I'm not sure about sockets). The main motivation is cases where you need to synchronize a large amount of data that is not available as regular files, as in the following scenarios: * Keep a copy of a block device http://rsync.samba.org/cgi-bin/rsync.fom?auth=ckdf646d354e73&editCmds=compact&file=87 * Make backups using a huge tar file instead of actual files, to prevent need to be root in order to preserve attributes: http://rsync.samba.org/cgi-bin/rsync.fom?auth=ckdf646d354e73&editCmds=compact&file=228 * Keep an SQL-formatted dump of database tables (i.e., output of mysqldump) for backup purposes In all three cases, using a temporary file may require unreasonable amounts of additional diskspace, and there is no good reason to mandate it. Attached is a largish patch against rsync-2.5.5 that adds this feature. The primary issues are as follows: 1. Addition of a "--read-devices" option for activating the device reading behavior, and acting accordingly in a few locations. 2. The code and usage of map_ptr() required the file size to be known in advance. This is impossible for devices and FIFOs. To address this, I changed the interface of map_ptr. It now returns void rather than char*, and updates two new members of map_struct: - m_ptr contains the pointer to the new data - m_len contains the length of the new data (at most the length requested, but less if EOF encountered) Also, the map_struct->file_size field is now updated by map_ptr() when it encounters an EOF. I updated all invocations of map_ptr() to use the new interface, and act correctly when a premature EOF is encountered. Apart from supporting the new feature, this makes the code more robust to EOFs (no more zeroing buffer memory and hoping for the best!) and, arguably, more elegant. The change affected many locations, but is pretty straightforward. I tried to do the minimal changes rather than eliminate all traces of the predetermined length assumption, since that would greatly inflate the patch. Also, this shouldn't prevent map_ptr() from being a wrapper to mmap() in the future, though the map_ptr() code will need to contain a fallback to read(). 3. Character devices and FIFO files can be read only once, so we can't "redo" them in the adaptive csum_length scheme (I think that it's safe to assume that block devices can be read multiple times, though). Thus such files must always be transferred with a generous csum_length, say, at least 8 bytes. I'm not sure how to handle this protocol-wise; as a temporary measure, I increased the initial csum_length from 2 to 8 if the --read-devices argument is given. This option is passed to the server, so older rsyncs will choke on it rather than fail due to a protocol mismatch. 4. FIFOs and character devices do not support the seek operation. It so happens that the combination of code in map_ptr() and the way it's used guarantees that the transmitter always reads the underlying file sequentially without seeks, for reasonable block sizes. This seems to be the case both from looking at the code and from testing it. Alas, this property is fragile and may break with changes in constants or code. I don't see how this can be fixed without pulling out all the map_ptr() code. 5. A few minor changes were required to keep the statistics straight. For instance, the token sending functions now return the number of data bytes they read. A few things I wasn't sure about are marked by "#ET#". I hope it would be possible to incorporate this patch into rsync, after proper review and testing. Regards, Eran Tromer -------------- next part -------------- diff -r -u4 rsync-2.5.5/checksum.c rsync-patched/checksum.c --- rsync-2.5.5/checksum.c Tue Oct 26 01:04:09 1999 +++ rsync-patched/checksum.c Mon Aug 5 10:05:15 2002 @@ -24,8 +24,9 @@ #define CSUM_CHUNK 64 int checksum_seed = 0; extern int remote_version; +extern int read_devices; /* a simple 32 bit checksum that can be upadted from either end (inspired by Mark Adler's Adler-32 checksum) @@ -73,43 +74,40 @@ for(i = 0; i + CSUM_CHUNK <= len; i += CSUM_CHUNK) { mdfour_update(&m, (uchar *)(buf1+i), CSUM_CHUNK); } - if (len - i > 0) { + if (len - i > 0) { /* make this ">=" to address http://lists.samba.org/pipermail/rsync/2002-August/008011.html */ mdfour_update(&m, (uchar *)(buf1+i), (len-i)); } mdfour_result(&m, (uchar *)sum); } -void file_checksum(char *fname,char *sum,OFF_T size) +void file_checksum(char *fname,char *sum) { - OFF_T i; + OFF_T i = 0; struct map_struct *buf; int fd; - OFF_T len = size; char tmpchunk[CSUM_CHUNK]; struct mdfour m; memset(sum,0,MD4_SUM_LENGTH); fd = do_open(fname, O_RDONLY, 0); if (fd == -1) return; - buf = map_file(fd,size); + buf = map_file(fd,OFF_T_MAX); /* map_ptr() will find out the real file size */ mdfour_begin(&m); - for(i = 0; i + CSUM_CHUNK <= len; i += CSUM_CHUNK) { - memcpy(tmpchunk, map_ptr(buf,i,CSUM_CHUNK), CSUM_CHUNK); - mdfour_update(&m, (uchar *)tmpchunk, CSUM_CHUNK); - } - - if (len - i > 0) { - memcpy(tmpchunk, map_ptr(buf,i,len-i), len-i); - mdfour_update(&m, (uchar *)tmpchunk, (len-i)); - } + do { + map_ptr(buf,i,CSUM_CHUNK); + memcpy(tmpchunk, buf->m_ptr, buf->m_len); + mdfour_update(&m, (uchar *)tmpchunk, buf->m_len); + i += CSUM_CHUNK; + } while (buf->m_len>0); + /* mdfour_update(&m, NULL, 0); uncomment to address http://lists.samba.org/pipermail/rsync/2002-August/008011.html */ mdfour_result(&m, (uchar *)sum); close(fd); @@ -118,12 +116,19 @@ void checksum_init(void) { - if (remote_version >= 14) - csum_length = 2; /* adaptive */ - else - csum_length = SUM_LENGTH; + if (remote_version >= 14) { + /* adaptively increasing checksum length, relying on multiple phases: */ + if (read_devices) + csum_length = MIN(8,SUM_LENGTH); + /* Because streams must succeed on first attempt. This is a protocol change! + It would be better to allow checksum lengths to differ between files. #ET# XXX */ + else + csum_length = 2; + } else + csum_length = SUM_LENGTH; + rprintf(FINFO, "#ET# csum_length=%d\n", csum_length); } @@ -171,9 +176,9 @@ } void sum_end(char *sum) { - if (sumresidue) { + if (sumresidue) { /* remove the test to address http://lists.samba.org/pipermail/rsync/2002-August/008011.html */ mdfour_update(&md, (uchar *)sumrbuf, sumresidue); } mdfour_result(&md, (uchar *)sum); Only in rsync-patched: config.h diff -r -u4 rsync-2.5.5/fileio.c rsync-patched/fileio.c --- rsync-2.5.5/fileio.c Sat Jan 26 02:07:34 2002 +++ rsync-patched/fileio.c Mon Aug 5 10:05:15 2002 @@ -111,32 +111,43 @@ map->p_size = 0; map->p_offset = 0; map->p_fd_offset = 0; map->p_len = 0; + map->m_ptr = NULL; + map->m_len = 0; return map; } -/* slide the read window in the file */ -char *map_ptr(struct map_struct *map,OFF_T offset,int len) +/* slide the read window in the file. + * The state of the window is saved in the map->p* members. + * A subrange of this window corresponding to the request is + * saved in the map->m_* members, for the use of the caller. */ +void map_ptr(struct map_struct *map,OFF_T offset,int len) { - int nread; OFF_T window_start, read_start; int window_size, read_size, read_offset; + if (verbose>3) rprintf(FINFO, "#ET# map_ptr(%d,%.0f,%.0f)\n",(int)map,(double)offset,(double)len); - if (len == 0) { - return NULL; + /* produce a NULL result if past (currently known) end of file */ + if (len == 0 || offset>=map->file_size) { + map->m_ptr = NULL; + map->m_len = 0; + return; } - /* can't go beyond the end of file */ + /* limit read length to (currently known) end of file */ if (len > (map->file_size - offset)) { len = map->file_size - offset; } /* in most cases the region will already be available */ if (offset >= map->p_offset && offset+len <= map->p_offset+map->p_len) { - return (map->p + (offset - map->p_offset)); + map->m_ptr = map->p + (offset - map->p_offset); + map->m_len = len; + if (verbose>3) rprintf(FINFO, "#ET# map_ptr: already in buf; p_offset=%.0f p_len=%.0f m_ptr-p=%.0f m_len=%.0f\n", (double)map->p_offset, (double)map->p_len, (double)(map->m_ptr-map->p), (double)map->m_len); + return; } /* nope, we are going to have to do a read. Work out our desired window */ @@ -175,32 +186,48 @@ read_size = window_size; read_offset = 0; } + if (verbose>2) rprintf(FINFO, "#ET# read_size=%.0f\n", (double)read_size); if (read_size <= 0) { - rprintf(FINFO,"Warning: unexpected read size of %d in map_ptr\n", read_size); + rprintf(FERROR,"Warning: unexpected read size of %d in map_ptr\n", read_size); + /* Should never happen. */ + } else if (read_start > map->file_size) { + rprintf(FERROR,"Warning: read start is past known file size in map_ptr\n"); + /* Should never happen. Expect havoc if streams are involved. */ } else { - if (map->p_fd_offset != read_start) { + int nread = 0; + if (map->p_fd_offset != read_start) { /* seek */ + rprintf(FINFO, "#ET# seeking; p_fd_offset=%.0f read_start=%.0f\n",(double)map->p_fd_offset,(double)read_start); if (do_lseek(map->fd,read_start,SEEK_SET) != read_start) { rprintf(FERROR,"lseek failed in map_ptr\n"); + /* Specifically, this will always happen if we seek in FIFO + * streams. That should never happen. */ exit_cleanup(RERR_FILEIO); } map->p_fd_offset = read_start; } - - if ((nread=read(map->fd,map->p + read_offset,read_size)) != read_size) { - if (nread < 0) nread = 0; - /* the best we can do is zero the buffer - the file - has changed mid transfer! */ - memset(map->p+read_offset+nread, 0, read_size - nread); + /* Try getting len bytes, giving up only if EOF. */ + while (nread<read_size) { + int nread1 = read(map->fd,map->p + read_offset+nread,read_size-nread); + if (nread1==0) { + /* EOF. For streams, only now we know the real file length. */ + if (map->file_size != map->p_fd_offset && verbose > 1) + rprintf(FINFO, "Encountered EOF after %.0f bytes\n", (double)map->p_fd_offset); + map->file_size = map->p_fd_offset; + window_size = map->p_fd_offset-window_start; + break; + } + nread += nread1; + map->p_fd_offset += nread1; } - map->p_fd_offset += nread; } map->p_offset = window_start; map->p_len = window_size; - return map->p + (offset - map->p_offset); + map->m_ptr = map->p + (offset - map->p_offset); + map->m_len = MIN(len, map->p_offset+map->p_len-offset); } void unmap_file(struct map_struct *map) diff -r -u4 rsync-2.5.5/flist.c rsync-patched/flist.c --- rsync-2.5.5/flist.c Fri Mar 15 00:20:20 2002 +++ rsync-patched/flist.c Mon Aug 5 10:05:15 2002 @@ -729,9 +729,9 @@ /* drat. we have to provide a null checksum for non-regular files in order to be compatible with earlier versions of rsync */ if (S_ISREG(st.st_mode)) { - file_checksum(fname, file->sum, st.st_size); + file_checksum(fname, file->sum); } else { memset(file->sum, 0, MD4_SUM_LENGTH); } } @@ -747,9 +747,9 @@ } else { file->basedir = NULL; } - if (!S_ISDIR(st.st_mode)) + if (S_ISREG(st.st_mode)) stats.total_size += st.st_size; return file; } diff -r -u4 rsync-2.5.5/generator.c rsync-patched/generator.c --- rsync-2.5.5/generator.c Mon Mar 25 08:54:31 2002 +++ rsync-patched/generator.c Mon Aug 5 10:05:15 2002 @@ -39,8 +39,9 @@ extern int io_timeout; extern int remote_version; extern int always_checksum; extern int modify_window; +extern int read_devices; extern char *compare_dest; /* choose whether to skip a particular file */ @@ -63,9 +64,9 @@ compare_dest,fname); fname = fnamecmpdest; } } - file_checksum(fname,sum,st->st_size); + file_checksum(fname,sum); if (remote_version < 21) { return (memcmp(sum,file->sum,2) == 0); } else { return (memcmp(sum,file->sum,MD4_SUM_LENGTH) == 0); @@ -119,8 +120,9 @@ write_buf(f_out, s->sums[i].sum2, csum_length); } } else { /* we don't have checksums */ + rprintf(FINFO, "#ET# send_sums: don't have checksums\n"); write_int(f_out, 0); write_int(f_out, block_size); write_int(f_out, 0); } @@ -193,24 +195,23 @@ s->sums = (struct sum_buf *)malloc(sizeof(s->sums[0])*s->count); if (!s->sums) out_of_memory("generate_sums"); for (i=0;i<count;i++) { - int n1 = MIN(len,n); - char *map = map_ptr(buf,offset,n1); + map_ptr(buf,offset,n); - s->sums[i].sum1 = get_checksum1(map,n1); - get_checksum2(map,n1,s->sums[i].sum2); + s->sums[i].sum1 = get_checksum1(buf->m_ptr,buf->m_len); + get_checksum2(buf->m_ptr,buf->m_len,s->sums[i].sum2); s->sums[i].offset = offset; - s->sums[i].len = n1; + s->sums[i].len = buf->m_len; s->sums[i].i = i; if (verbose > 3) rprintf(FINFO,"chunk[%d] offset=%.0f len=%d sum1=%08x\n", i,(double)s->sums[i].offset,s->sums[i].len,s->sums[i].sum1); - len -= n1; - offset += n1; + len -= buf->m_len; + offset += buf->m_len; } return s; } @@ -245,8 +246,9 @@ if (verbose > 2) rprintf(FINFO,"recv_generator(%s,%d)\n",fname,i); statret = link_stat(fname,&st); + { int errsave=errno; rprintf(FINFO, "#ET# recv_generator: statret==%d, errno=%d\n", statret, errno); errno=errsave; } if (only_existing && statret == -1 && errno == ENOENT) { /* we only want to update existing files */ if (verbose > 1) rprintf(FINFO, "not creating new file \"%s\"\n",fname); @@ -337,9 +339,9 @@ return; } #ifdef HAVE_MKNOD - if (am_root && preserve_devices && IS_DEVICE(file->mode)) { + if (!read_devices && am_root && preserve_devices && IS_DEVICE(file->mode)) { if (statret != 0 || st.st_mode != file->mode || st.st_rdev != file->rdev) { delete_file(fname); @@ -361,15 +363,23 @@ #endif if (preserve_hard_links && check_hard_link(file)) { if (verbose > 1) - rprintf(FINFO, "recv_generator: \"%s\" is a hard link\n",f_name(file)); + rprintf(FINFO, "recv_generator: \"%s\" is a hard link\n",fname); return; } if (!S_ISREG(file->mode)) { - rprintf(FINFO, "skipping non-regular file \"%s\"\n",fname); - return; + if (read_devices && (S_ISFIFO(file->mode) || S_ISBLK(file->mode) || S_ISCHR(file->mode))) { + if (verbose > 1) { + int saveerrno = errno; /* here and elsewhere -- should be replaced by explicitly saving errno after stat. #ET#XXX */ + rprintf(FINFO,"%s is a FIFO or device, copying its content to a regular file.\n", f_name(file)); + errno = saveerrno; + } + } else { + rprintf(FINFO, "skipping non-regular file \"%s\"\n",fname); + return; + } } fnamecmp = fname; @@ -385,8 +395,9 @@ else fnamecmp = fnamecmpbuf; } + { int errsave=errno; rprintf(FINFO, "#ET# recv_generator: statret==%d, errno=%d\n", statret, errno); errno=errsave; } if (statret == -1) { if (errno == ENOENT) { write_int(f_out,i); if (!dry_run) send_sums(NULL,f_out); @@ -433,8 +444,9 @@ return; } if (disable_deltas_p()) { + rprintf(FINFO, "#ET# recv_generator: disable_deltas_p()==TRUE!\n"); write_int(f_out,i); send_sums(NULL,f_out); return; } diff -r -u4 rsync-2.5.5/match.c rsync-patched/match.c --- rsync-2.5.5/match.c Sun Feb 3 04:38:39 2002 +++ rsync-patched/match.c Mon Aug 5 10:05:15 2002 @@ -95,26 +95,27 @@ { OFF_T n = offset - last_match; OFF_T j; + if (verbose > 2 && i >= 0) rprintf(FINFO,"match at %.0f last_match=%.0f j=%d len=%d n=%.0f\n", (double)offset,(double)last_match,i,s->sums[i].len,(double)n); - send_token(f,i,buf,last_match,n,i<0?0:s->sums[i].len); - data_transfer += n; + data_transfer += send_token(f,i,buf,last_match,n,i<0?0:s->sums[i].len);; if (i >= 0) { stats.matched_data += s->sums[i].len; n += s->sums[i].len; } - for (j=0;j<n;j+=CHUNK_SIZE) { int n1 = MIN(CHUNK_SIZE,n-j); - sum_update(map_ptr(buf,last_match+j,n1),n1); + if (verbose>2) rprintf(FINFO, "#ET#>matched() calling sum_update(buf,%.0f,%.0f)\n", (double)last_match+j,(double)n1); + map_ptr(buf,last_match+j,n1); + if (buf->m_len==0) break; + sum_update(buf->m_ptr,buf->m_len); } - if (i >= 0) last_match = offset + s->sums[i].len; else last_match = offset; @@ -127,45 +128,41 @@ } static void hash_search(int f,struct sum_struct *s, - struct map_struct *buf,OFF_T len) + struct map_struct *buf) { - OFF_T offset, end; + OFF_T offset; int j,k, last_i; char sum2[SUM_LENGTH]; uint32 s1, s2, sum; - schar *map; /* last_i is used to encourage adjacent matches, allowing the RLL coding of the output to work more efficiently */ last_i = -1; if (verbose > 2) - rprintf(FINFO,"hash search b=%d len=%.0f\n",s->n,(double)len); + rprintf(FINFO,"hash search b=%d\n",s->n); - k = MIN(len, s->n); - - map = (schar *)map_ptr(buf,0,k); - - sum = get_checksum1((char *)map, k); + map_ptr(buf,0,s->n); + k = buf->m_len; + sum = get_checksum1(buf->m_ptr, k); s1 = sum & 0xFFFF; s2 = sum >> 16; if (verbose > 3) rprintf(FINFO, "sum=%.8x k=%d\n", sum, k); offset = 0; - end = len + 1 - s->sums[s->count-1].len; - if (verbose > 3) - rprintf(FINFO,"hash search s->n=%d len=%.0f count=%d\n", - s->n,(double)len,s->count); + rprintf(FINFO,"hash search s->n=%d count=%d\n", + s->n,s->count); - do { + while (k >= s->sums[s->count-1].len) { /* look for a matching chunk as long as it might exist */ tag t = gettag2(s1,s2); int done_csum2 = 0; - + if (verbose>4) rprintf(FINFO, "#ET#>hash_search: iteration, offset=%.0f\n", (double)offset); + j = tag_table[t]; if (verbose > 4) rprintf(FINFO,"offset=%.0f sum=%08x\n",(double)offset,sum); @@ -175,23 +172,26 @@ sum = (s1 & 0xffff) | (s2 << 16); tag_hits++; for (; j < (int) s->count && targets[j].t == t; j++) { - int l, i = targets[j].i; + int i = targets[j].i; if (sum != s->sums[i].sum1) continue; /* also make sure the two blocks are the same length */ - l = MIN(s->n,len-offset); - if (l != s->sums[i].len) continue; + if (s->sums[i].len != k) continue; /* This is more restrictive than before stream support. #ET#*/ if (verbose > 3) rprintf(FINFO,"potential match at %.0f target=%d %d sum=%08x\n", (double)offset,j,i,sum); if (!done_csum2) { - map = (schar *)map_ptr(buf,offset,l); - get_checksum2((char *)map,l,sum2); + map_ptr(buf,offset,k); + if (buf->m_len != k) { + rprintf(FERROR,"error while reading block for checksum2 calculation, offset=%.0f len=%d\n", (double)offset, k); + break; + } + get_checksum2(buf->m_ptr,k,sum2); done_csum2 = 1; } if (memcmp(sum2,s->sums[i].sum2,csum_length) != 0) { @@ -213,29 +213,33 @@ } } last_i = i; - + + if (verbose) rprintf(FINFO, "#ET#>hash_search: block match at offset=%.0f\n", (double)offset); matched(f,s,buf,offset,i); offset += s->sums[i].len - 1; - k = MIN((len-offset), s->n); - map = (schar *)map_ptr(buf,offset,k); - sum = get_checksum1((char *)map, k); + + map_ptr(buf,offset,s->n); + k = buf->m_len; + sum = get_checksum1(buf->m_ptr, k); + s1 = sum & 0xFFFF; s2 = sum >> 16; matches++; break; } null_tag: /* Trim off the first byte from the checksum */ - map = (schar *)map_ptr(buf,offset,k+1); - s1 -= map[0] + CHAR_OFFSET; - s2 -= k * (map[0]+CHAR_OFFSET); + map_ptr(buf,offset,k+1); + if (buf->m_len==0) break; /* encountered EOF */ + s1 -= buf->m_ptr[0] + CHAR_OFFSET; + s2 -= k * (buf->m_ptr[0]+CHAR_OFFSET); /* Add on the next byte (if there is one) to the checksum */ - if (k < (len-offset)) { - s1 += (map[k]+CHAR_OFFSET); + if (buf->m_len>k) { + s1 += (buf->m_ptr[k]+CHAR_OFFSET); s2 += s1; } else { --k; } @@ -247,19 +251,23 @@ running match, the checksum update and the literal send. */ if (offset > last_match && offset-last_match >= CHUNK_SIZE+s->n && - (end-offset > CHUNK_SIZE)) { + (buf->file_size - offset > CHUNK_SIZE)) { + rprintf(FINFO, "#ET#>hash_search: writing literal data at offset %.0f\n", (double)offset); matched(f,s,buf,offset - s->n, -2); } - } while (++offset < end); + ++offset; + } - matched(f,s,buf,len,-1); - map_ptr(buf,len-1,1); + rprintf(FINFO, "#ET#>hash_search: writing -1 token\n"); + matched(f,s,buf,buf->file_size,-1); + + /*#ET# Why was this needed? map_ptr(buf,len-1,1); */ } -void match_sums(int f,struct sum_struct *s,struct map_struct *buf,OFF_T len) +void match_sums(int f,struct sum_struct *s,struct map_struct *buf) { char file_sum[MD4_SUM_LENGTH]; extern int write_batch; /* dw */ @@ -269,27 +277,33 @@ matches=0; data_transfer=0; sum_init(); + rprintf(FINFO, "#ET#>match_sums(), s->count=%.0f, csum_length=%d\n", (double)s->count, csum_length); - if (len > 0 && s->count>0) { + if (buf==NULL) { + rprintf(FINFO, "#ET#>match_sums() sends an empty file (buf=NULL)\n"); + matched(f,s,NULL,0,-1); + } else if (buf->file_size>0 && s->count>0) { build_hash_table(s); if (verbose > 2) rprintf(FINFO,"built hash table\n"); - hash_search(f,s,buf,len); + hash_search(f,s,buf); if (verbose > 2) rprintf(FINFO,"done hash search\n"); } else { OFF_T j; + rprintf(FINFO, "#ET#>match_sums() will send literal data\n"); /* by doing this in pieces we avoid too many seeks */ - for (j=0;j<(len-CHUNK_SIZE);j+=CHUNK_SIZE) { - int n1 = MIN(CHUNK_SIZE,(len-CHUNK_SIZE)-j); - matched(f,s,buf,j+n1,-2); + for (j=0; j+CHUNK_SIZE < buf->file_size; j+=CHUNK_SIZE) { + rprintf(FINFO, "#ET#>PIECE AT OFFSET %lld\n", j); + matched(f,s,buf,j+CHUNK_SIZE,-2); } - matched(f,s,buf,len,-1); + rprintf(FINFO, "#ET#>match_sums: writing -1 token\n"); + matched(f,s,buf,buf->file_size,-1); } sum_end(file_sum); diff -r -u4 rsync-2.5.5/options.c rsync-patched/options.c --- rsync-2.5.5/options.c Tue Mar 19 23:16:42 2002 +++ rsync-patched/options.c Mon Aug 5 10:05:15 2002 @@ -86,8 +86,9 @@ #else int modify_window=0; #endif int blocking_io=-1; +int read_devices=0; /** Network address family. **/ #ifdef INET6 int default_af_hint = 0; /* Any protocol */ @@ -218,8 +219,9 @@ rprintf(F," -p, --perms preserve permissions\n"); rprintf(F," -o, --owner preserve owner (root only)\n"); rprintf(F," -g, --group preserve group\n"); rprintf(F," -D, --devices preserve devices (root only)\n"); + rprintf(F," --read-devices read content of device and fifo files\n"); rprintf(F," -t, --times preserve times\n"); rprintf(F," -S, --sparse handle sparse files efficiently\n"); rprintf(F," -n, --dry-run show what would have been transferred\n"); rprintf(F," -W, --whole-file copy whole files, no incremental checks\n"); @@ -286,9 +288,9 @@ OPT_LOG_FORMAT, OPT_PASSWORD_FILE, OPT_SIZE_ONLY, OPT_ADDRESS, OPT_DELETE_AFTER, OPT_EXISTING, OPT_MAX_DELETE, OPT_BACKUP_DIR, OPT_IGNORE_ERRORS, OPT_BWLIMIT, OPT_BLOCKING_IO, OPT_NO_BLOCKING_IO, OPT_WHOLE_FILE, OPT_NO_WHOLE_FILE, - OPT_MODIFY_WINDOW, OPT_READ_BATCH, OPT_WRITE_BATCH, OPT_IGNORE_EXISTING}; + OPT_MODIFY_WINDOW, OPT_READ_BATCH, OPT_WRITE_BATCH, OPT_IGNORE_EXISTING, OPT_READ_DEVICES}; static struct poptOption long_options[] = { /* longName, shortName, argInfo, argPtr, value, descrip, argDesc */ {"version", 0, POPT_ARG_NONE, 0, OPT_VERSION, 0, 0}, @@ -360,8 +362,9 @@ {"backup-dir", 0, POPT_ARG_STRING, &backup_dir , 0, 0, 0 }, {"hard-links", 'H', POPT_ARG_NONE, &preserve_hard_links , 0, 0, 0 }, {"read-batch", 0, POPT_ARG_STRING, &batch_prefix, OPT_READ_BATCH, 0, 0 }, {"write-batch", 0, POPT_ARG_STRING, &batch_prefix, OPT_WRITE_BATCH, 0, 0 }, + {"read-devices", 0, POPT_ARG_NONE, &read_devices, 0, 0, 0 }, #ifdef INET6 {0, '4', POPT_ARG_VAL, &default_af_hint, AF_INET , 0, 0 }, {0, '6', POPT_ARG_VAL, &default_af_hint, AF_INET6 , 0, 0 }, #endif @@ -590,8 +593,16 @@ " write-batch or read-batch\n"); return 0; } + if (preserve_devices && read_devices) { + snprintf(err_buf,sizeof(err_buf), + "devices and read-devices can not be used together\n"); + rprintf(FERROR,"ERROR: devices and read-devices" + " can not be used together\n"); + return 0; + } + *argv = poptGetArgs(pc); if (*argv) *argc = count_args(*argv); else @@ -766,8 +777,16 @@ if (opt_ignore_existing && am_sender) args[ac++] = "--ignore-existing"; + if (read_devices) { + args[ac++] = "--read_devices"; + /* older versions will choke on this, which is OK since + * this option causes an increase in csum_length. See note about + * this in checksum.c. #ET# + */ + } + if (tmpdir) { args[ac++] = "--temp-dir"; args[ac++] = tmpdir; } diff -r -u4 rsync-2.5.5/proto.h rsync-patched/proto.h --- rsync-2.5.5/proto.h Mon Mar 25 06:51:17 2002 +++ rsync-patched/proto.h Mon Aug 5 10:05:15 2002 @@ -25,9 +25,9 @@ void show_flist(int index, struct file_struct **fptr); void show_argvs(int argc, char *argv[]); uint32 get_checksum1(char *buf1,int len); void get_checksum2(char *buf,int len,char *sum); -void file_checksum(char *fname,char *sum,OFF_T size); +void file_checksum(char *fname,char *sum); void checksum_init(void); void sum_init(void); void sum_update(char *p,int len); void sum_end(char *sum); @@ -71,9 +71,9 @@ void add_cvs_excludes(void); int sparse_end(int f); int write_file(int f,char *buf,size_t len); struct map_struct *map_file(int fd,OFF_T len); -char *map_ptr(struct map_struct *map,OFF_T offset,int len); +void map_ptr(struct map_struct *map,OFF_T offset,int len); //#ET# void unmap_file(struct map_struct *map); void show_flist_stats(void); int readlink_stat(const char *Path, STRUCT_STAT * Buffer, char *Linkbuf); int link_stat(const char *Path, STRUCT_STAT * Buffer); @@ -164,9 +164,9 @@ void wait_process(pid_t pid, int *status); void start_server(int f_in, int f_out, int argc, char *argv[]); int client_run(int f_in, int f_out, pid_t pid, int argc, char *argv[]); int main(int argc,char *argv[]); -void match_sums(int f,struct sum_struct *s,struct map_struct *buf,OFF_T len); +void match_sums(int f,struct sum_struct *s,struct map_struct *buf); void match_report(void); void usage(enum logcode F); void option_error(void); int parse_arguments(int *argc, const char ***argv, int frommain); @@ -216,9 +216,9 @@ void *do_mmap(void *start, int len, int prot, int flags, int fd, OFF_T offset); char *d_name(struct dirent *di); int main (int argc, char *argv[]); void set_compression(char *fname); -void send_token(int f,int token,struct map_struct *buf,OFF_T offset, +int send_token(int f,int token,struct map_struct *buf,OFF_T offset, int n,int toklen); int recv_token(int f,char **data); void see_token(char *data, int toklen); int main(int argc, char **argv); diff -r -u4 rsync-2.5.5/receiver.c rsync-patched/receiver.c --- rsync-2.5.5/receiver.c Wed Feb 13 21:42:20 2002 +++ rsync-patched/receiver.c Mon Aug 5 10:05:15 2002 @@ -35,8 +35,9 @@ extern char *tmpdir; extern char *compare_dest; extern int make_backups; extern char *backup_suffix; +extern int read_devices; static struct delete_list { DEV64_T dev; INO64_T inode; @@ -212,16 +213,17 @@ OFF_T offset2; char *data; static char file_sum1[MD4_SUM_LENGTH]; static char file_sum2[MD4_SUM_LENGTH]; - char *map=NULL; - + rprintf(FINFO, "#ET#<receive_data for %s\n", fname); + count = read_int(f_in); n = read_int(f_in); remainder = read_int(f_in); sum_init(); + rprintf(FINFO, "#ET#<receive_data: got header, reading tokens\n"); for (i=recv_token(f_in,&data); i != 0; i=recv_token(f_in,&data)) { show_progress(offset, total_size); @@ -258,22 +260,29 @@ rprintf(FINFO,"chunk[%d] of size %d at %.0f offset=%.0f\n", i,len,(double)offset2,(double)offset); if (buf) { - map = map_ptr(buf,offset2,len); - - see_token(map, len); - sum_update(map,len); + map_ptr(buf,offset2,len); + if (buf->m_len != len) { + rprintf(FERROR,"chunk read failed on %s : %s\n", + fname,strerror(errno)); + exit_cleanup(RERR_FILEIO); + } + + see_token(buf->m_ptr, len); + sum_update(buf->m_ptr, len); + + if (fd != -1 && write_file(fd,buf->m_ptr,len) != (int) len) { + rprintf(FERROR,"write failed on %s : %s\n", + fname,strerror(errno)); + exit_cleanup(RERR_FILEIO); + } } - if (fd != -1 && write_file(fd,map,len) != (int) len) { - rprintf(FERROR,"write failed on %s : %s\n", - fname,strerror(errno)); - exit_cleanup(RERR_FILEIO); - } offset += len; } + rprintf(FINFO, "#ET#<receive_data: got 0 token\n"); end_progress(total_size); if (fd != -1 && offset > 0 && sparse_end(fd) != 0) { rprintf(FERROR,"write failed on %s : %s\n", @@ -347,8 +356,10 @@ file = flist->files[i]; fname = f_name(file); + rprintf(FINFO, "#ET#<recv_files: handling %s\n", fname); + stats.num_transferred_files++; stats.total_transferred_size += file->length; if (local_name) @@ -470,9 +481,11 @@ cleanup_disable(); if (!recv_ok) { - if (csum_length == SUM_LENGTH) { + if (read_devices && (S_ISFIFO(file->mode) || S_ISCHR(file->mode))) { + rprintf(FERROR,"ERROR: failed to transfer content of FIFO or character device %s, cannot redo.\n", fname); + } else if (csum_length == SUM_LENGTH) { rprintf(FERROR,"ERROR: file corruption in %s. File changed during transfer?\n", fname); } else { if (verbose > 1) diff -r -u4 rsync-2.5.5/rsync.h rsync-patched/rsync.h --- rsync-2.5.5/rsync.h Mon Mar 25 10:29:43 2002 +++ rsync-patched/rsync.h Mon Aug 5 10:05:15 2002 @@ -256,8 +256,9 @@ #else #define OFF_T off_t #define STRUCT_STAT struct stat #endif +#define OFF_T_MAX ( ~( ((OFF_T)1) << (sizeof(OFF_T)*8-1) ) ) #if HAVE_OFF64_T #define int64 off64_t #elif (SIZEOF_LONG == 8) @@ -387,8 +388,10 @@ struct map_struct { char *p; int fd,p_size,p_len; OFF_T file_size, p_offset, p_fd_offset; + char* m_ptr; /* start of data from last map_ptr() call */ + int m_len; /* length of data from last map_ptr() call */ }; struct exclude_struct { char *pattern; diff -r -u4 rsync-2.5.5/sender.c rsync-patched/sender.c --- rsync-2.5.5/sender.c Sat Jan 26 02:07:33 2002 +++ rsync-patched/sender.c Mon Aug 5 10:05:15 2002 @@ -17,17 +17,18 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #include "rsync.h" +#include "limits.h" extern int verbose; extern int remote_version; extern int csum_length; extern struct stats stats; extern int io_error; extern int dry_run; extern int am_server; - +extern int read_devices; /* receive the checksums for a buffer */ @@ -163,8 +164,9 @@ io_error = 1; rprintf(FERROR,"receive_sums failed\n"); return; } + rprintf(FINFO, "#ET#>send_files: got s, s->count=%.0f\n", (double)s->count); if (write_batch) write_batch_csum_info(&i,flist->count,s); @@ -186,17 +188,38 @@ close(fd); return; } + if (read_devices && (S_ISFIFO(st.st_mode) || S_ISCHR(st.st_mode))) { + /* We won't read FIFO files or character devices more than + * once. + * ( We also can't seek in these files, but it so happens + * that as a sender, the buffering code in map_ptr() ensures + * that we'll never seek in the underlying file. + * This is fragile. #ET# */ + if (phase==0 && csum_length<8) + rprintf(FINFO, "Transferring a FIFO file with a short checksum. Cannot redo if this fails!\n"); + else if (phase>0) { + rprintf(FERROR, "Cannot redo a FIFO file or character device. It must succeed in phase 0.\n"); + //#ET#XXX do we need to set some error variable or something? + return; + } + } + if (read_devices && IS_DEVICE(st.st_mode)) { + /* We don't know the length of the content of devices files + * in advance. We'll count on map_ptr() to do the EOF detection */ + st.st_size=OFF_T_MAX; + } + if (st.st_size > 0) { buf = map_file(fd,st.st_size); } else { buf = NULL; } if (verbose > 2) - rprintf(FINFO,"send_files mapped %s of size %.0f\n", - fname,(double)st.st_size); + rprintf(FINFO,"send_files mapped %s of size %.0f\n", + fname,(double)st.st_size); write_int(f_out,i); if (write_batch) @@ -258,9 +281,15 @@ fname); continue; } } else { - match_sums(f_out,s,buf,st.st_size); + rprintf(FINFO, "#ET#>Really calling match_sums()\n"); + match_sums(f_out,s,buf); + if (read_devices && (S_ISFIFO(st.st_mode) || S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode))) { + /* For FIFO and device files know the file size only now -- update stats. */ + stats.total_size += buf->file_size; + initial_stats.total_size += buf->file_size; + } log_send(file, &initial_stats); } if (!read_batch) { /* dw */ diff -r -u4 rsync-2.5.5/token.c rsync-patched/token.c --- rsync-2.5.5/token.c Tue Aug 14 05:04:49 2001 +++ rsync-patched/token.c Mon Aug 5 10:05:15 2002 @@ -85,27 +85,32 @@ return n; } -/* non-compressing send token */ -static void simple_send_token(int f,int token, +/* non-compressing send token. returns size of data sent. */ +static int simple_send_token(int f,int token, struct map_struct *buf,OFF_T offset,int n) { extern int write_batch; /* dw */ int hold_int; /* dw */ + int l = 0; if (n > 0) { - int l = 0; while (l < n) { int n1 = MIN(CHUNK_SIZE,n-l); - write_int(f,n1); - write_buf(f,map_ptr(buf,offset+l,n1),n1); + rprintf(FINFO, "#ET#>simple_send_token will do write_buf(f,map_ptr(buf,%.0f+l,%.0f),%.0f)\n",(double)offset,(double)n1,(double)n1); + map_ptr(buf,offset+l,n1); + if (buf->m_len==0) + break; + write_int(f,buf->m_len); + write_buf(f,buf->m_ptr,buf->m_len); if (write_batch) { write_batch_delta_file( (char *) &n1, sizeof(int) ); - write_batch_delta_file(map_ptr(buf,offset+l,n1),n1); + write_batch_delta_file(buf->m_ptr,buf->m_len); } - l += n1; - } + l += buf->m_len; + + }; } /* a -2 token means to send data only and no token */ if (token != -2) { write_int(f,-(token+1)); @@ -113,8 +118,10 @@ hold_int = -(token+1); write_batch_delta_file( (char *) &hold_int, sizeof(int) ); } } + + return l; } /* Flag bytes in compressed stream are encoded as follows: */ @@ -137,17 +144,18 @@ /* Output buffer */ static char *obuf; -/* Send a deflated token */ -static void +/* Send a deflated token. Returns size of data sent (before compression). */ +static int send_deflated_token(int f, int token, struct map_struct *buf, OFF_T offset, int nb, int toklen) { int n, r; static int init_done, flush_pending; extern int write_batch; /* dw */ char temp_byte; /* dw */ + int data_read = 0; if (last_token == -1) { /* initialization */ if (!init_done) { @@ -215,13 +223,15 @@ do { if (tx_strm.avail_in == 0 && nb != 0) { /* give it some more input */ n = MIN(nb, CHUNK_SIZE); - tx_strm.next_in = (Bytef *) - map_ptr(buf, offset, n); - tx_strm.avail_in = n; - nb -= n; - offset += n; + + map_ptr(buf, offset, n); + tx_strm.next_in = (Bytef *)buf->m_ptr; + tx_strm.avail_in = buf->m_len; + nb -= buf->m_len; + offset += buf->m_len; + data_read += buf->m_len; } if (tx_strm.avail_out == 0) { tx_strm.next_out = (Bytef *)(obuf + 2); tx_strm.avail_out = MAX_DATA_COUNT; @@ -276,10 +286,12 @@ } else if (token != -2) { /* add the data in the current block to the compressor's history and hash table */ - tx_strm.next_in = (Bytef *) map_ptr(buf, offset, toklen); - tx_strm.avail_in = toklen; + map_ptr(buf, offset, toklen); + data_read += buf->m_len; + tx_strm.next_in = (Bytef *) buf->m_len; + tx_strm.avail_in = buf->m_len; tx_strm.next_out = (Bytef *) obuf; tx_strm.avail_out = MAX_DATA_COUNT; r = deflate(&tx_strm, Z_INSERT_ONLY); if (r != Z_OK || tx_strm.avail_in != 0) { @@ -287,8 +299,9 @@ r, tx_strm.avail_in); exit_cleanup(RERR_STREAMIO); } } + return data_read; } /* tells us what the receiver is in the middle of doing */ @@ -477,16 +490,17 @@ /* * transmit a verbatim buffer of length n followed by a token * If token == -1 then we have reached EOF * If n == 0 then don't send a buffer + * Returns number of data bytes sent. */ -void send_token(int f,int token,struct map_struct *buf,OFF_T offset, +int send_token(int f,int token,struct map_struct *buf,OFF_T offset, int n,int toklen) { if (!do_compression) { - simple_send_token(f,token,buf,offset,n); + return simple_send_token(f,token,buf,offset,n); } else { - send_deflated_token(f, token, buf, offset, n, toklen); + return send_deflated_token(f, token, buf, offset, n, toklen); } }
That's an interesting patch and I can see how the feature would be very useful. I am happy to add it to the contributed patches/ directory. I am very hesitant to add new options at this stage without some systematic way to test and maintain them in the future. -- Martin
On Mon, Aug 05, 2002 at 10:28:38AM +0300, Eran Tromer wrote:> Greetings, > > I'd like to propose a new option to rsync, which causes it to read > device files as if they were regular files. This includes pipes, > character devices and block devices (I'm not sure about sockets). The > main motivation is cases where you need to synchronize a large amount of > data that is not available as regular files, as in the following scenarios: > > * Keep a copy of a block device > http://rsync.samba.org/cgi-bin/rsync.fom?auth=ckdf646d354e73&editCmds=compact&file=87 > > * Make backups using a huge tar file instead of actual files, to prevent > need to be root in order to preserve attributes: > http://rsync.samba.org/cgi-bin/rsync.fom?auth=ckdf646d354e73&editCmds=compact&file=228 > > * Keep an SQL-formatted dump of database tables (i.e., output of > mysqldump) for backup purposes > > In all three cases, using a temporary file may require unreasonable > amounts of additional diskspace, and there is no good reason to mandate it. > > Attached is a largish patch against rsync-2.5.5 that adds this feature. > The primary issues are as follows: > > 1. Addition of a "--read-devices" option for activating the device > reading behavior, and acting accordingly in a few locations. > > 2. The code and usage of map_ptr() required the file size to be known in > advance. This is impossible for devices and FIFOs.And for decompressed data --- I have started to modify rsync and to rewrite the sender to do only sequentiell reads. But at the moment I've got stuck and the weather was fine..... :-)> To address this, I > changed the interface of map_ptr. It now returns void rather than char*, > and updates two new members of map_struct: > - m_ptr contains the pointer to the new data > - m_len contains the length of the new data > (at most the length requested, but less if EOF encountered) > Also, the map_struct->file_size field is now updated by map_ptr() when > it encounters an EOF. > I updated all invocations of map_ptr() to use the new interface, and act > correctly when a premature EOF is encountered. Apart from supporting the > new feature, this makes the code more robust to EOFs (no more zeroing > buffer memory and hoping for the best!) and, arguably, more elegant. The > change affected many locations, but is pretty straightforward. > I tried to do the minimal changes rather than eliminate all traces of > the predetermined length assumption, since that would greatly inflate > the patch. Also, this shouldn't prevent map_ptr() from being a wrapper > to mmap() in the future, though the map_ptr() code will need to contain > a fallback to read().My intension was to run the rsync checksum algorithm on an decoded (gzip) data stream. For this I had to to get rid of all possible calls of seek() in map_ptr. I will have a look at your code before continuing with my attempt.> 4. FIFOs and character devices do not support the seek operation. It so > happens that the combination of code in map_ptr() and the way it's used > guarantees that the transmitter always reads the underlying file > sequentially without seeks, for reasonable block sizes. This seems to be > the case both from looking at the code and from testing it. Alas, this > property is fragile and may break with changes in constants or code. I > don't see how this can be fixed without pulling out all the map_ptr() code.I think you are right and for this reason I started to rewrite the sender. If I hadn't written the generator patch the modifications to generator to work with files of unknown size had been trivial. I think solving both problems will lead to a protocol change. cu, Stefan -- Stefan Nehlsen | ParlaNet Administration | sn@parlanet.de | +49 431 988-1260
On Mon, Aug 05, 2002 at 10:28:38AM +0300, Eran Tromer wrote:> Greetings, > > I'd like to propose a new option to rsync, which causes it to read > device files as if they were regular files. This includes pipes, > character devices and block devices (I'm not sure about sockets). The > main motivation is cases where you need to synchronize a large amount of > data that is not available as regular files, as in the following scenarios: > > * Keep a copy of a block device > http://rsync.samba.org/cgi-bin/rsync.fom?auth=ckdf646d354e73&editCmds=compact&file=87 > > * Make backups using a huge tar file instead of actual files, to prevent > need to be root in order to preserve attributes: > http://rsync.samba.org/cgi-bin/rsync.fom?auth=ckdf646d354e73&editCmds=compact&file=228 > > * Keep an SQL-formatted dump of database tables (i.e., output of > mysqldump) for backup purposes > > In all three cases, using a temporary file may require unreasonable > amounts of additional diskspace, and there is no good reason to mandate it.I'm not that familiar with rsync code at the lowest level, but I'm guessing this patch adds support for in-place patching, possibly without reverse seeks in the basis. How does this work? Does it have an adverse affect on the delta size? Have I got this wrong and it just allows the source to be a device, still requiring a temporary file for target creation? -- ---------------------------------------------------------------------- ABO: finger abo@minkirri.apana.org.au for more info, including pgp key ----------------------------------------------------------------------
Eeek. I've just noticed that I've mailed the wrong diff -- this one has a lot of rprintf() debugging cruft, and also a small bug. If you'd like a clean version, just ask. Eran Eran Tromer wrote:> Greetings, > > I'd like to propose a new option to rsync, which causes it to read > device files as if they were regular files. This includes pipes, > character devices and block devices (I'm not sure about sockets). The > main motivation is cases where you need to synchronize a large amount of > data that is not available as regular files, as in the following scenarios:[snip]
Similar patches have been submitted before but they've always been rejected. I'm sorry that you spent so much time on this one, although perhaps it's been useful to you so far. As you know, rsync won't be able to write to such devices because it needs to work on a temporary copy; that limits the usefulness of reading from them greatly. I'm not at all convinced that it's worths supporting it in rsync. I can't read your examples in the rsync.fom right now because it's broken, so I don't understand your motivating examples. How can you rsync into a tar file, and what do SQL databases have to do with device files? - Dave Dykstra On Mon, Aug 05, 2002 at 10:28:38AM +0300, Eran Tromer wrote:> Greetings, > > I'd like to propose a new option to rsync, which causes it to read > device files as if they were regular files. This includes pipes, > character devices and block devices (I'm not sure about sockets). The > main motivation is cases where you need to synchronize a large amount of > data that is not available as regular files, as in the following scenarios: > > * Keep a copy of a block device > http://rsync.samba.org/cgi-bin/rsync.fom?auth=ckdf646d354e73&editCmds=compact&file=87 > > * Make backups using a huge tar file instead of actual files, to prevent > need to be root in order to preserve attributes: > http://rsync.samba.org/cgi-bin/rsync.fom?auth=ckdf646d354e73&editCmds=compact&file=228 > > * Keep an SQL-formatted dump of database tables (i.e., output of > mysqldump) for backup purposes > > In all three cases, using a temporary file may require unreasonable > amounts of additional diskspace, and there is no good reason to mandate it. > > Attached is a largish patch against rsync-2.5.5 that adds this feature. > The primary issues are as follows: > > 1. Addition of a "--read-devices" option for activating the device > reading behavior, and acting accordingly in a few locations. > > 2. The code and usage of map_ptr() required the file size to be known in > advance. This is impossible for devices and FIFOs. To address this, I > changed the interface of map_ptr. It now returns void rather than char*, > and updates two new members of map_struct: > - m_ptr contains the pointer to the new data > - m_len contains the length of the new data > (at most the length requested, but less if EOF encountered) > Also, the map_struct->file_size field is now updated by map_ptr() when > it encounters an EOF. > I updated all invocations of map_ptr() to use the new interface, and act > correctly when a premature EOF is encountered. Apart from supporting the > new feature, this makes the code more robust to EOFs (no more zeroing > buffer memory and hoping for the best!) and, arguably, more elegant. The > change affected many locations, but is pretty straightforward. > I tried to do the minimal changes rather than eliminate all traces of > the predetermined length assumption, since that would greatly inflate > the patch. Also, this shouldn't prevent map_ptr() from being a wrapper > to mmap() in the future, though the map_ptr() code will need to contain > a fallback to read(). > > 3. Character devices and FIFO files can be read only once, so we can't > "redo" them in the adaptive csum_length scheme (I think that it's safe > to assume that block devices can be read multiple times, though). Thus > such files must always be transferred with a generous csum_length, say, > at least 8 bytes. I'm not sure how to handle this protocol-wise; as a > temporary measure, I increased the initial csum_length from 2 to 8 if > the --read-devices argument is given. This option is passed to the > server, so older rsyncs will choke on it rather than fail due to a > protocol mismatch. > > 4. FIFOs and character devices do not support the seek operation. It so > happens that the combination of code in map_ptr() and the way it's used > guarantees that the transmitter always reads the underlying file > sequentially without seeks, for reasonable block sizes. This seems to be > the case both from looking at the code and from testing it. Alas, this > property is fragile and may break with changes in constants or code. I > don't see how this can be fixed without pulling out all the map_ptr() code. > > 5. A few minor changes were required to keep the statistics straight. > For instance, the token sending functions now return the number of data > bytes they read. > > A few things I wasn't sure about are marked by "#ET#". > > I hope it would be possible to incorporate this patch into rsync, after > proper review and testing. > > Regards, > Eran Tromer> diff -r -u4 rsync-2.5.5/checksum.c rsync-patched/checksum.c > --- rsync-2.5.5/checksum.c Tue Oct 26 01:04:09 1999 > +++ rsync-patched/checksum.c Mon Aug 5 10:05:15 2002 > @@ -24,8 +24,9 @@ > #define CSUM_CHUNK 64 > > int checksum_seed = 0; > extern int remote_version; > +extern int read_devices; > > /* > a simple 32 bit checksum that can be upadted from either end > (inspired by Mark Adler's Adler-32 checksum) > @@ -73,43 +74,40 @@ > > for(i = 0; i + CSUM_CHUNK <= len; i += CSUM_CHUNK) { > mdfour_update(&m, (uchar *)(buf1+i), CSUM_CHUNK); > } > - if (len - i > 0) { > + if (len - i > 0) { /* make this ">=" to address http://lists.samba.org/pipermail/rsync/2002-August/008011.html */ > mdfour_update(&m, (uchar *)(buf1+i), (len-i)); > } > > mdfour_result(&m, (uchar *)sum); > } > > > -void file_checksum(char *fname,char *sum,OFF_T size) > +void file_checksum(char *fname,char *sum) > { > - OFF_T i; > + OFF_T i = 0; > struct map_struct *buf; > int fd; > - OFF_T len = size; > char tmpchunk[CSUM_CHUNK]; > struct mdfour m; > > memset(sum,0,MD4_SUM_LENGTH); > > fd = do_open(fname, O_RDONLY, 0); > if (fd == -1) return; > > - buf = map_file(fd,size); > + buf = map_file(fd,OFF_T_MAX); /* map_ptr() will find out the real file size */ > > mdfour_begin(&m); > > - for(i = 0; i + CSUM_CHUNK <= len; i += CSUM_CHUNK) { > - memcpy(tmpchunk, map_ptr(buf,i,CSUM_CHUNK), CSUM_CHUNK); > - mdfour_update(&m, (uchar *)tmpchunk, CSUM_CHUNK); > - } > - > - if (len - i > 0) { > - memcpy(tmpchunk, map_ptr(buf,i,len-i), len-i); > - mdfour_update(&m, (uchar *)tmpchunk, (len-i)); > - } > + do { > + map_ptr(buf,i,CSUM_CHUNK); > + memcpy(tmpchunk, buf->m_ptr, buf->m_len); > + mdfour_update(&m, (uchar *)tmpchunk, buf->m_len); > + i += CSUM_CHUNK; > + } while (buf->m_len>0); > + /* mdfour_update(&m, NULL, 0); uncomment to address http://lists.samba.org/pipermail/rsync/2002-August/008011.html */ > > mdfour_result(&m, (uchar *)sum); > > close(fd); > @@ -118,12 +116,19 @@ > > > void checksum_init(void) > { > - if (remote_version >= 14) > - csum_length = 2; /* adaptive */ > - else > - csum_length = SUM_LENGTH; > + if (remote_version >= 14) { > + /* adaptively increasing checksum length, relying on multiple phases: */ > + if (read_devices) > + csum_length = MIN(8,SUM_LENGTH); > + /* Because streams must succeed on first attempt. This is a protocol change! > + It would be better to allow checksum lengths to differ between files. #ET# XXX */ > + else > + csum_length = 2; > + } else > + csum_length = SUM_LENGTH; > + rprintf(FINFO, "#ET# csum_length=%d\n", csum_length); > } > > > > @@ -171,9 +176,9 @@ > } > > void sum_end(char *sum) > { > - if (sumresidue) { > + if (sumresidue) { /* remove the test to address http://lists.samba.org/pipermail/rsync/2002-August/008011.html */ > mdfour_update(&md, (uchar *)sumrbuf, sumresidue); > } > > mdfour_result(&md, (uchar *)sum); > Only in rsync-patched: config.h > diff -r -u4 rsync-2.5.5/fileio.c rsync-patched/fileio.c > --- rsync-2.5.5/fileio.c Sat Jan 26 02:07:34 2002 > +++ rsync-patched/fileio.c Mon Aug 5 10:05:15 2002 > @@ -111,32 +111,43 @@ > map->p_size = 0; > map->p_offset = 0; > map->p_fd_offset = 0; > map->p_len = 0; > + map->m_ptr = NULL; > + map->m_len = 0; > > return map; > } > > -/* slide the read window in the file */ > -char *map_ptr(struct map_struct *map,OFF_T offset,int len) > +/* slide the read window in the file. > + * The state of the window is saved in the map->p* members. > + * A subrange of this window corresponding to the request is > + * saved in the map->m_* members, for the use of the caller. */ > +void map_ptr(struct map_struct *map,OFF_T offset,int len) > { > - int nread; > OFF_T window_start, read_start; > int window_size, read_size, read_offset; > + if (verbose>3) rprintf(FINFO, "#ET# map_ptr(%d,%.0f,%.0f)\n",(int)map,(double)offset,(double)len); > > - if (len == 0) { > - return NULL; > + /* produce a NULL result if past (currently known) end of file */ > + if (len == 0 || offset>=map->file_size) { > + map->m_ptr = NULL; > + map->m_len = 0; > + return; > } > > - /* can't go beyond the end of file */ > + /* limit read length to (currently known) end of file */ > if (len > (map->file_size - offset)) { > len = map->file_size - offset; > } > > /* in most cases the region will already be available */ > if (offset >= map->p_offset && > offset+len <= map->p_offset+map->p_len) { > - return (map->p + (offset - map->p_offset)); > + map->m_ptr = map->p + (offset - map->p_offset); > + map->m_len = len; > + if (verbose>3) rprintf(FINFO, "#ET# map_ptr: already in buf; p_offset=%.0f p_len=%.0f m_ptr-p=%.0f m_len=%.0f\n", (double)map->p_offset, (double)map->p_len, (double)(map->m_ptr-map->p), (double)map->m_len); > + return; > } > > > /* nope, we are going to have to do a read. Work out our desired window */ > @@ -175,32 +186,48 @@ > read_size = window_size; > read_offset = 0; > } > > + if (verbose>2) rprintf(FINFO, "#ET# read_size=%.0f\n", (double)read_size); > if (read_size <= 0) { > - rprintf(FINFO,"Warning: unexpected read size of %d in map_ptr\n", read_size); > + rprintf(FERROR,"Warning: unexpected read size of %d in map_ptr\n", read_size); > + /* Should never happen. */ > + } else if (read_start > map->file_size) { > + rprintf(FERROR,"Warning: read start is past known file size in map_ptr\n"); > + /* Should never happen. Expect havoc if streams are involved. */ > } else { > - if (map->p_fd_offset != read_start) { > + int nread = 0; > + if (map->p_fd_offset != read_start) { /* seek */ > + rprintf(FINFO, "#ET# seeking; p_fd_offset=%.0f read_start=%.0f\n",(double)map->p_fd_offset,(double)read_start); > if (do_lseek(map->fd,read_start,SEEK_SET) != read_start) { > rprintf(FERROR,"lseek failed in map_ptr\n"); > + /* Specifically, this will always happen if we seek in FIFO > + * streams. That should never happen. */ > exit_cleanup(RERR_FILEIO); > } > map->p_fd_offset = read_start; > } > - > - if ((nread=read(map->fd,map->p + read_offset,read_size)) != read_size) { > - if (nread < 0) nread = 0; > - /* the best we can do is zero the buffer - the file > - has changed mid transfer! */ > - memset(map->p+read_offset+nread, 0, read_size - nread); > + /* Try getting len bytes, giving up only if EOF. */ > + while (nread<read_size) { > + int nread1 = read(map->fd,map->p + read_offset+nread,read_size-nread); > + if (nread1==0) { > + /* EOF. For streams, only now we know the real file length. */ > + if (map->file_size != map->p_fd_offset && verbose > 1) > + rprintf(FINFO, "Encountered EOF after %.0f bytes\n", (double)map->p_fd_offset); > + map->file_size = map->p_fd_offset; > + window_size = map->p_fd_offset-window_start; > + break; > + } > + nread += nread1; > + map->p_fd_offset += nread1; > } > - map->p_fd_offset += nread; > } > > map->p_offset = window_start; > map->p_len = window_size; > > - return map->p + (offset - map->p_offset); > + map->m_ptr = map->p + (offset - map->p_offset); > + map->m_len = MIN(len, map->p_offset+map->p_len-offset); > } > > > void unmap_file(struct map_struct *map) > diff -r -u4 rsync-2.5.5/flist.c rsync-patched/flist.c > --- rsync-2.5.5/flist.c Fri Mar 15 00:20:20 2002 > +++ rsync-patched/flist.c Mon Aug 5 10:05:15 2002 > @@ -729,9 +729,9 @@ > /* drat. we have to provide a null checksum for non-regular > files in order to be compatible with earlier versions > of rsync */ > if (S_ISREG(st.st_mode)) { > - file_checksum(fname, file->sum, st.st_size); > + file_checksum(fname, file->sum); > } else { > memset(file->sum, 0, MD4_SUM_LENGTH); > } > } > @@ -747,9 +747,9 @@ > } else { > file->basedir = NULL; > } > > - if (!S_ISDIR(st.st_mode)) > + if (S_ISREG(st.st_mode)) > stats.total_size += st.st_size; > > return file; > } > diff -r -u4 rsync-2.5.5/generator.c rsync-patched/generator.c > --- rsync-2.5.5/generator.c Mon Mar 25 08:54:31 2002 > +++ rsync-patched/generator.c Mon Aug 5 10:05:15 2002 > @@ -39,8 +39,9 @@ > extern int io_timeout; > extern int remote_version; > extern int always_checksum; > extern int modify_window; > +extern int read_devices; > extern char *compare_dest; > > > /* choose whether to skip a particular file */ > @@ -63,9 +64,9 @@ > compare_dest,fname); > fname = fnamecmpdest; > } > } > - file_checksum(fname,sum,st->st_size); > + file_checksum(fname,sum); > if (remote_version < 21) { > return (memcmp(sum,file->sum,2) == 0); > } else { > return (memcmp(sum,file->sum,MD4_SUM_LENGTH) == 0); > @@ -119,8 +120,9 @@ > write_buf(f_out, s->sums[i].sum2, csum_length); > } > } else { > /* we don't have checksums */ > + rprintf(FINFO, "#ET# send_sums: don't have checksums\n"); > write_int(f_out, 0); > write_int(f_out, block_size); > write_int(f_out, 0); > } > @@ -193,24 +195,23 @@ > s->sums = (struct sum_buf *)malloc(sizeof(s->sums[0])*s->count); > if (!s->sums) out_of_memory("generate_sums"); > > for (i=0;i<count;i++) { > - int n1 = MIN(len,n); > - char *map = map_ptr(buf,offset,n1); > + map_ptr(buf,offset,n); > > - s->sums[i].sum1 = get_checksum1(map,n1); > - get_checksum2(map,n1,s->sums[i].sum2); > + s->sums[i].sum1 = get_checksum1(buf->m_ptr,buf->m_len); > + get_checksum2(buf->m_ptr,buf->m_len,s->sums[i].sum2); > > s->sums[i].offset = offset; > - s->sums[i].len = n1; > + s->sums[i].len = buf->m_len; > s->sums[i].i = i; > > if (verbose > 3) > rprintf(FINFO,"chunk[%d] offset=%.0f len=%d sum1=%08x\n", > i,(double)s->sums[i].offset,s->sums[i].len,s->sums[i].sum1); > > - len -= n1; > - offset += n1; > + len -= buf->m_len; > + offset += buf->m_len; > } > > return s; > } > @@ -245,8 +246,9 @@ > if (verbose > 2) > rprintf(FINFO,"recv_generator(%s,%d)\n",fname,i); > > statret = link_stat(fname,&st); > + { int errsave=errno; rprintf(FINFO, "#ET# recv_generator: statret==%d, errno=%d\n", statret, errno); errno=errsave; } > > if (only_existing && statret == -1 && errno == ENOENT) { > /* we only want to update existing files */ > if (verbose > 1) rprintf(FINFO, "not creating new file \"%s\"\n",fname); > @@ -337,9 +339,9 @@ > return; > } > > #ifdef HAVE_MKNOD > - if (am_root && preserve_devices && IS_DEVICE(file->mode)) { > + if (!read_devices && am_root && preserve_devices && IS_DEVICE(file->mode)) { > if (statret != 0 || > st.st_mode != file->mode || > st.st_rdev != file->rdev) { > delete_file(fname); > @@ -361,15 +363,23 @@ > #endif > > if (preserve_hard_links && check_hard_link(file)) { > if (verbose > 1) > - rprintf(FINFO, "recv_generator: \"%s\" is a hard link\n",f_name(file)); > + rprintf(FINFO, "recv_generator: \"%s\" is a hard link\n",fname); > return; > } > > if (!S_ISREG(file->mode)) { > - rprintf(FINFO, "skipping non-regular file \"%s\"\n",fname); > - return; > + if (read_devices && (S_ISFIFO(file->mode) || S_ISBLK(file->mode) || S_ISCHR(file->mode))) { > + if (verbose > 1) { > + int saveerrno = errno; /* here and elsewhere -- should be replaced by explicitly saving errno after stat. #ET#XXX */ > + rprintf(FINFO,"%s is a FIFO or device, copying its content to a regular file.\n", f_name(file)); > + errno = saveerrno; > + } > + } else { > + rprintf(FINFO, "skipping non-regular file \"%s\"\n",fname); > + return; > + } > } > > fnamecmp = fname; > > @@ -385,8 +395,9 @@ > else > fnamecmp = fnamecmpbuf; > } > > + { int errsave=errno; rprintf(FINFO, "#ET# recv_generator: statret==%d, errno=%d\n", statret, errno); errno=errsave; } > if (statret == -1) { > if (errno == ENOENT) { > write_int(f_out,i); > if (!dry_run) send_sums(NULL,f_out); > @@ -433,8 +444,9 @@ > return; > } > > if (disable_deltas_p()) { > + rprintf(FINFO, "#ET# recv_generator: disable_deltas_p()==TRUE!\n"); > write_int(f_out,i); > send_sums(NULL,f_out); > return; > } > diff -r -u4 rsync-2.5.5/match.c rsync-patched/match.c > --- rsync-2.5.5/match.c Sun Feb 3 04:38:39 2002 > +++ rsync-patched/match.c Mon Aug 5 10:05:15 2002 > @@ -95,26 +95,27 @@ > { > OFF_T n = offset - last_match; > OFF_T j; > > + > if (verbose > 2 && i >= 0) > rprintf(FINFO,"match at %.0f last_match=%.0f j=%d len=%d n=%.0f\n", > (double)offset,(double)last_match,i,s->sums[i].len,(double)n); > > - send_token(f,i,buf,last_match,n,i<0?0:s->sums[i].len); > - data_transfer += n; > + data_transfer += send_token(f,i,buf,last_match,n,i<0?0:s->sums[i].len);; > > if (i >= 0) { > stats.matched_data += s->sums[i].len; > n += s->sums[i].len; > } > - > for (j=0;j<n;j+=CHUNK_SIZE) { > int n1 = MIN(CHUNK_SIZE,n-j); > - sum_update(map_ptr(buf,last_match+j,n1),n1); > + if (verbose>2) rprintf(FINFO, "#ET#>matched() calling sum_update(buf,%.0f,%.0f)\n", (double)last_match+j,(double)n1); > + map_ptr(buf,last_match+j,n1); > + if (buf->m_len==0) break; > + sum_update(buf->m_ptr,buf->m_len); > } > > - > if (i >= 0) > last_match = offset + s->sums[i].len; > else > last_match = offset; > @@ -127,45 +128,41 @@ > } > > > static void hash_search(int f,struct sum_struct *s, > - struct map_struct *buf,OFF_T len) > + struct map_struct *buf) > { > - OFF_T offset, end; > + OFF_T offset; > int j,k, last_i; > char sum2[SUM_LENGTH]; > uint32 s1, s2, sum; > - schar *map; > > /* last_i is used to encourage adjacent matches, allowing the RLL coding of the > output to work more efficiently */ > last_i = -1; > > if (verbose > 2) > - rprintf(FINFO,"hash search b=%d len=%.0f\n",s->n,(double)len); > + rprintf(FINFO,"hash search b=%d\n",s->n); > > - k = MIN(len, s->n); > - > - map = (schar *)map_ptr(buf,0,k); > - > - sum = get_checksum1((char *)map, k); > + map_ptr(buf,0,s->n); > + k = buf->m_len; > + sum = get_checksum1(buf->m_ptr, k); > s1 = sum & 0xFFFF; > s2 = sum >> 16; > if (verbose > 3) > rprintf(FINFO, "sum=%.8x k=%d\n", sum, k); > > offset = 0; > > - end = len + 1 - s->sums[s->count-1].len; > - > if (verbose > 3) > - rprintf(FINFO,"hash search s->n=%d len=%.0f count=%d\n", > - s->n,(double)len,s->count); > + rprintf(FINFO,"hash search s->n=%d count=%d\n", > + s->n,s->count); > > - do { > + while (k >= s->sums[s->count-1].len) { /* look for a matching chunk as long as it might exist */ > tag t = gettag2(s1,s2); > int done_csum2 = 0; > - > + if (verbose>4) rprintf(FINFO, "#ET#>hash_search: iteration, offset=%.0f\n", (double)offset); > + > j = tag_table[t]; > if (verbose > 4) > rprintf(FINFO,"offset=%.0f sum=%08x\n",(double)offset,sum); > > @@ -175,23 +172,26 @@ > > sum = (s1 & 0xffff) | (s2 << 16); > tag_hits++; > for (; j < (int) s->count && targets[j].t == t; j++) { > - int l, i = targets[j].i; > + int i = targets[j].i; > > if (sum != s->sums[i].sum1) continue; > > /* also make sure the two blocks are the same length */ > - l = MIN(s->n,len-offset); > - if (l != s->sums[i].len) continue; > + if (s->sums[i].len != k) continue; /* This is more restrictive than before stream support. #ET#*/ > > if (verbose > 3) > rprintf(FINFO,"potential match at %.0f target=%d %d sum=%08x\n", > (double)offset,j,i,sum); > > if (!done_csum2) { > - map = (schar *)map_ptr(buf,offset,l); > - get_checksum2((char *)map,l,sum2); > + map_ptr(buf,offset,k); > + if (buf->m_len != k) { > + rprintf(FERROR,"error while reading block for checksum2 calculation, offset=%.0f len=%d\n", (double)offset, k); > + break; > + } > + get_checksum2(buf->m_ptr,k,sum2); > done_csum2 = 1; > } > > if (memcmp(sum2,s->sums[i].sum2,csum_length) != 0) { > @@ -213,29 +213,33 @@ > } > } > > last_i = i; > - > + > + if (verbose) rprintf(FINFO, "#ET#>hash_search: block match at offset=%.0f\n", (double)offset); > matched(f,s,buf,offset,i); > offset += s->sums[i].len - 1; > - k = MIN((len-offset), s->n); > - map = (schar *)map_ptr(buf,offset,k); > - sum = get_checksum1((char *)map, k); > + > + map_ptr(buf,offset,s->n); > + k = buf->m_len; > + sum = get_checksum1(buf->m_ptr, k); > + > s1 = sum & 0xFFFF; > s2 = sum >> 16; > matches++; > break; > } > > null_tag: > /* Trim off the first byte from the checksum */ > - map = (schar *)map_ptr(buf,offset,k+1); > - s1 -= map[0] + CHAR_OFFSET; > - s2 -= k * (map[0]+CHAR_OFFSET); > + map_ptr(buf,offset,k+1); > + if (buf->m_len==0) break; /* encountered EOF */ > + s1 -= buf->m_ptr[0] + CHAR_OFFSET; > + s2 -= k * (buf->m_ptr[0]+CHAR_OFFSET); > > /* Add on the next byte (if there is one) to the checksum */ > - if (k < (len-offset)) { > - s1 += (map[k]+CHAR_OFFSET); > + if (buf->m_len>k) { > + s1 += (buf->m_ptr[k]+CHAR_OFFSET); > s2 += s1; > } else { > --k; > } > @@ -247,19 +251,23 @@ > running match, the checksum update and the > literal send. */ > if (offset > last_match && > offset-last_match >= CHUNK_SIZE+s->n && > - (end-offset > CHUNK_SIZE)) { > + (buf->file_size - offset > CHUNK_SIZE)) { > + rprintf(FINFO, "#ET#>hash_search: writing literal data at offset %.0f\n", (double)offset); > matched(f,s,buf,offset - s->n, -2); > } > - } while (++offset < end); > + ++offset; > + } > > - matched(f,s,buf,len,-1); > - map_ptr(buf,len-1,1); > + rprintf(FINFO, "#ET#>hash_search: writing -1 token\n"); > + matched(f,s,buf,buf->file_size,-1); > + > + /*#ET# Why was this needed? map_ptr(buf,len-1,1); */ > } > > > -void match_sums(int f,struct sum_struct *s,struct map_struct *buf,OFF_T len) > +void match_sums(int f,struct sum_struct *s,struct map_struct *buf) > { > char file_sum[MD4_SUM_LENGTH]; > extern int write_batch; /* dw */ > > @@ -269,27 +277,33 @@ > matches=0; > data_transfer=0; > > sum_init(); > + rprintf(FINFO, "#ET#>match_sums(), s->count=%.0f, csum_length=%d\n", (double)s->count, csum_length); > > - if (len > 0 && s->count>0) { > + if (buf==NULL) { > + rprintf(FINFO, "#ET#>match_sums() sends an empty file (buf=NULL)\n"); > + matched(f,s,NULL,0,-1); > + } else if (buf->file_size>0 && s->count>0) { > build_hash_table(s); > > if (verbose > 2) > rprintf(FINFO,"built hash table\n"); > > - hash_search(f,s,buf,len); > + hash_search(f,s,buf); > > if (verbose > 2) > rprintf(FINFO,"done hash search\n"); > } else { > OFF_T j; > + rprintf(FINFO, "#ET#>match_sums() will send literal data\n"); > /* by doing this in pieces we avoid too many seeks */ > - for (j=0;j<(len-CHUNK_SIZE);j+=CHUNK_SIZE) { > - int n1 = MIN(CHUNK_SIZE,(len-CHUNK_SIZE)-j); > - matched(f,s,buf,j+n1,-2); > + for (j=0; j+CHUNK_SIZE < buf->file_size; j+=CHUNK_SIZE) { > + rprintf(FINFO, "#ET#>PIECE AT OFFSET %lld\n", j); > + matched(f,s,buf,j+CHUNK_SIZE,-2); > } > - matched(f,s,buf,len,-1); > + rprintf(FINFO, "#ET#>match_sums: writing -1 token\n"); > + matched(f,s,buf,buf->file_size,-1); > } > > sum_end(file_sum); > > diff -r -u4 rsync-2.5.5/options.c rsync-patched/options.c > --- rsync-2.5.5/options.c Tue Mar 19 23:16:42 2002 > +++ rsync-patched/options.c Mon Aug 5 10:05:15 2002 > @@ -86,8 +86,9 @@ > #else > int modify_window=0; > #endif > int blocking_io=-1; > +int read_devices=0; > > /** Network address family. **/ > #ifdef INET6 > int default_af_hint = 0; /* Any protocol */ > @@ -218,8 +219,9 @@ > rprintf(F," -p, --perms preserve permissions\n"); > rprintf(F," -o, --owner preserve owner (root only)\n"); > rprintf(F," -g, --group preserve group\n"); > rprintf(F," -D, --devices preserve devices (root only)\n"); > + rprintf(F," --read-devices read content of device and fifo files\n"); > rprintf(F," -t, --times preserve times\n"); > rprintf(F," -S, --sparse handle sparse files efficiently\n"); > rprintf(F," -n, --dry-run show what would have been transferred\n"); > rprintf(F," -W, --whole-file copy whole files, no incremental checks\n"); > @@ -286,9 +288,9 @@ > OPT_LOG_FORMAT, OPT_PASSWORD_FILE, OPT_SIZE_ONLY, OPT_ADDRESS, > OPT_DELETE_AFTER, OPT_EXISTING, OPT_MAX_DELETE, OPT_BACKUP_DIR, > OPT_IGNORE_ERRORS, OPT_BWLIMIT, OPT_BLOCKING_IO, > OPT_NO_BLOCKING_IO, OPT_WHOLE_FILE, OPT_NO_WHOLE_FILE, > - OPT_MODIFY_WINDOW, OPT_READ_BATCH, OPT_WRITE_BATCH, OPT_IGNORE_EXISTING}; > + OPT_MODIFY_WINDOW, OPT_READ_BATCH, OPT_WRITE_BATCH, OPT_IGNORE_EXISTING, OPT_READ_DEVICES}; > > static struct poptOption long_options[] = { > /* longName, shortName, argInfo, argPtr, value, descrip, argDesc */ > {"version", 0, POPT_ARG_NONE, 0, OPT_VERSION, 0, 0}, > @@ -360,8 +362,9 @@ > {"backup-dir", 0, POPT_ARG_STRING, &backup_dir , 0, 0, 0 }, > {"hard-links", 'H', POPT_ARG_NONE, &preserve_hard_links , 0, 0, 0 }, > {"read-batch", 0, POPT_ARG_STRING, &batch_prefix, OPT_READ_BATCH, 0, 0 }, > {"write-batch", 0, POPT_ARG_STRING, &batch_prefix, OPT_WRITE_BATCH, 0, 0 }, > + {"read-devices", 0, POPT_ARG_NONE, &read_devices, 0, 0, 0 }, > #ifdef INET6 > {0, '4', POPT_ARG_VAL, &default_af_hint, AF_INET , 0, 0 }, > {0, '6', POPT_ARG_VAL, &default_af_hint, AF_INET6 , 0, 0 }, > #endif > @@ -590,8 +593,16 @@ > " write-batch or read-batch\n"); > return 0; > } > > + if (preserve_devices && read_devices) { > + snprintf(err_buf,sizeof(err_buf), > + "devices and read-devices can not be used together\n"); > + rprintf(FERROR,"ERROR: devices and read-devices" > + " can not be used together\n"); > + return 0; > + } > + > *argv = poptGetArgs(pc); > if (*argv) > *argc = count_args(*argv); > else > @@ -766,8 +777,16 @@ > > if (opt_ignore_existing && am_sender) > args[ac++] = "--ignore-existing"; > > + if (read_devices) { > + args[ac++] = "--read_devices"; > + /* older versions will choke on this, which is OK since > + * this option causes an increase in csum_length. See note about > + * this in checksum.c. #ET# > + */ > + } > + > if (tmpdir) { > args[ac++] = "--temp-dir"; > args[ac++] = tmpdir; > } > diff -r -u4 rsync-2.5.5/proto.h rsync-patched/proto.h > --- rsync-2.5.5/proto.h Mon Mar 25 06:51:17 2002 > +++ rsync-patched/proto.h Mon Aug 5 10:05:15 2002 > @@ -25,9 +25,9 @@ > void show_flist(int index, struct file_struct **fptr); > void show_argvs(int argc, char *argv[]); > uint32 get_checksum1(char *buf1,int len); > void get_checksum2(char *buf,int len,char *sum); > -void file_checksum(char *fname,char *sum,OFF_T size); > +void file_checksum(char *fname,char *sum); > void checksum_init(void); > void sum_init(void); > void sum_update(char *p,int len); > void sum_end(char *sum); > @@ -71,9 +71,9 @@ > void add_cvs_excludes(void); > int sparse_end(int f); > int write_file(int f,char *buf,size_t len); > struct map_struct *map_file(int fd,OFF_T len); > -char *map_ptr(struct map_struct *map,OFF_T offset,int len); > +void map_ptr(struct map_struct *map,OFF_T offset,int len); //#ET# > void unmap_file(struct map_struct *map); > void show_flist_stats(void); > int readlink_stat(const char *Path, STRUCT_STAT * Buffer, char *Linkbuf); > int link_stat(const char *Path, STRUCT_STAT * Buffer); > @@ -164,9 +164,9 @@ > void wait_process(pid_t pid, int *status); > void start_server(int f_in, int f_out, int argc, char *argv[]); > int client_run(int f_in, int f_out, pid_t pid, int argc, char *argv[]); > int main(int argc,char *argv[]); > -void match_sums(int f,struct sum_struct *s,struct map_struct *buf,OFF_T len); > +void match_sums(int f,struct sum_struct *s,struct map_struct *buf); > void match_report(void); > void usage(enum logcode F); > void option_error(void); > int parse_arguments(int *argc, const char ***argv, int frommain); > @@ -216,9 +216,9 @@ > void *do_mmap(void *start, int len, int prot, int flags, int fd, OFF_T offset); > char *d_name(struct dirent *di); > int main (int argc, char *argv[]); > void set_compression(char *fname); > -void send_token(int f,int token,struct map_struct *buf,OFF_T offset, > +int send_token(int f,int token,struct map_struct *buf,OFF_T offset, > int n,int toklen); > int recv_token(int f,char **data); > void see_token(char *data, int toklen); > int main(int argc, char **argv); > diff -r -u4 rsync-2.5.5/receiver.c rsync-patched/receiver.c > --- rsync-2.5.5/receiver.c Wed Feb 13 21:42:20 2002 > +++ rsync-patched/receiver.c Mon Aug 5 10:05:15 2002 > @@ -35,8 +35,9 @@ > extern char *tmpdir; > extern char *compare_dest; > extern int make_backups; > extern char *backup_suffix; > +extern int read_devices; > > static struct delete_list { > DEV64_T dev; > INO64_T inode; > @@ -212,16 +213,17 @@ > OFF_T offset2; > char *data; > static char file_sum1[MD4_SUM_LENGTH]; > static char file_sum2[MD4_SUM_LENGTH]; > - char *map=NULL; > - > + rprintf(FINFO, "#ET#<receive_data for %s\n", fname); > + > count = read_int(f_in); > n = read_int(f_in); > remainder = read_int(f_in); > > sum_init(); > > + rprintf(FINFO, "#ET#<receive_data: got header, reading tokens\n"); > for (i=recv_token(f_in,&data); i != 0; i=recv_token(f_in,&data)) { > > show_progress(offset, total_size); > > @@ -258,22 +260,29 @@ > rprintf(FINFO,"chunk[%d] of size %d at %.0f offset=%.0f\n", > i,len,(double)offset2,(double)offset); > > if (buf) { > - map = map_ptr(buf,offset2,len); > - > - see_token(map, len); > - sum_update(map,len); > + map_ptr(buf,offset2,len); > + if (buf->m_len != len) { > + rprintf(FERROR,"chunk read failed on %s : %s\n", > + fname,strerror(errno)); > + exit_cleanup(RERR_FILEIO); > + } > + > + see_token(buf->m_ptr, len); > + sum_update(buf->m_ptr, len); > + > + if (fd != -1 && write_file(fd,buf->m_ptr,len) != (int) len) { > + rprintf(FERROR,"write failed on %s : %s\n", > + fname,strerror(errno)); > + exit_cleanup(RERR_FILEIO); > + } > } > > - if (fd != -1 && write_file(fd,map,len) != (int) len) { > - rprintf(FERROR,"write failed on %s : %s\n", > - fname,strerror(errno)); > - exit_cleanup(RERR_FILEIO); > - } > offset += len; > } > > + rprintf(FINFO, "#ET#<receive_data: got 0 token\n"); > end_progress(total_size); > > if (fd != -1 && offset > 0 && sparse_end(fd) != 0) { > rprintf(FERROR,"write failed on %s : %s\n", > @@ -347,8 +356,10 @@ > > file = flist->files[i]; > fname = f_name(file); > > + rprintf(FINFO, "#ET#<recv_files: handling %s\n", fname); > + > stats.num_transferred_files++; > stats.total_transferred_size += file->length; > > if (local_name) > @@ -470,9 +481,11 @@ > > cleanup_disable(); > > if (!recv_ok) { > - if (csum_length == SUM_LENGTH) { > + if (read_devices && (S_ISFIFO(file->mode) || S_ISCHR(file->mode))) { > + rprintf(FERROR,"ERROR: failed to transfer content of FIFO or character device %s, cannot redo.\n", fname); > + } else if (csum_length == SUM_LENGTH) { > rprintf(FERROR,"ERROR: file corruption in %s. File changed during transfer?\n", > fname); > } else { > if (verbose > 1) > diff -r -u4 rsync-2.5.5/rsync.h rsync-patched/rsync.h > --- rsync-2.5.5/rsync.h Mon Mar 25 10:29:43 2002 > +++ rsync-patched/rsync.h Mon Aug 5 10:05:15 2002 > @@ -256,8 +256,9 @@ > #else > #define OFF_T off_t > #define STRUCT_STAT struct stat > #endif > +#define OFF_T_MAX ( ~( ((OFF_T)1) << (sizeof(OFF_T)*8-1) ) ) > > #if HAVE_OFF64_T > #define int64 off64_t > #elif (SIZEOF_LONG == 8) > @@ -387,8 +388,10 @@ > struct map_struct { > char *p; > int fd,p_size,p_len; > OFF_T file_size, p_offset, p_fd_offset; > + char* m_ptr; /* start of data from last map_ptr() call */ > + int m_len; /* length of data from last map_ptr() call */ > }; > > struct exclude_struct { > char *pattern; > diff -r -u4 rsync-2.5.5/sender.c rsync-patched/sender.c > --- rsync-2.5.5/sender.c Sat Jan 26 02:07:33 2002 > +++ rsync-patched/sender.c Mon Aug 5 10:05:15 2002 > @@ -17,17 +17,18 @@ > Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. > */ > > #include "rsync.h" > +#include "limits.h" > > extern int verbose; > extern int remote_version; > extern int csum_length; > extern struct stats stats; > extern int io_error; > extern int dry_run; > extern int am_server; > - > +extern int read_devices; > > /* > receive the checksums for a buffer > */ > @@ -163,8 +164,9 @@ > io_error = 1; > rprintf(FERROR,"receive_sums failed\n"); > return; > } > + rprintf(FINFO, "#ET#>send_files: got s, s->count=%.0f\n", (double)s->count); > > if (write_batch) > write_batch_csum_info(&i,flist->count,s); > > @@ -186,17 +188,38 @@ > close(fd); > return; > } > > + if (read_devices && (S_ISFIFO(st.st_mode) || S_ISCHR(st.st_mode))) { > + /* We won't read FIFO files or character devices more than > + * once. > + * ( We also can't seek in these files, but it so happens > + * that as a sender, the buffering code in map_ptr() ensures > + * that we'll never seek in the underlying file. > + * This is fragile. #ET# */ > + if (phase==0 && csum_length<8) > + rprintf(FINFO, "Transferring a FIFO file with a short checksum. Cannot redo if this fails!\n"); > + else if (phase>0) { > + rprintf(FERROR, "Cannot redo a FIFO file or character device. It must succeed in phase 0.\n"); > + //#ET#XXX do we need to set some error variable or something? > + return; > + } > + } > + if (read_devices && IS_DEVICE(st.st_mode)) { > + /* We don't know the length of the content of devices files > + * in advance. We'll count on map_ptr() to do the EOF detection */ > + st.st_size=OFF_T_MAX; > + } > + > if (st.st_size > 0) { > buf = map_file(fd,st.st_size); > } else { > buf = NULL; > } > > if (verbose > 2) > - rprintf(FINFO,"send_files mapped %s of size %.0f\n", > - fname,(double)st.st_size); > + rprintf(FINFO,"send_files mapped %s of size %.0f\n", > + fname,(double)st.st_size); > > write_int(f_out,i); > > if (write_batch) > @@ -258,9 +281,15 @@ > fname); > continue; > } > } else { > - match_sums(f_out,s,buf,st.st_size); > + rprintf(FINFO, "#ET#>Really calling match_sums()\n"); > + match_sums(f_out,s,buf); > + if (read_devices && (S_ISFIFO(st.st_mode) || S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode))) { > + /* For FIFO and device files know the file size only now -- update stats. */ > + stats.total_size += buf->file_size; > + initial_stats.total_size += buf->file_size; > + } > log_send(file, &initial_stats); > } > > if (!read_batch) { /* dw */ > diff -r -u4 rsync-2.5.5/token.c rsync-patched/token.c > --- rsync-2.5.5/token.c Tue Aug 14 05:04:49 2001 > +++ rsync-patched/token.c Mon Aug 5 10:05:15 2002 > @@ -85,27 +85,32 @@ > return n; > } > > > -/* non-compressing send token */ > -static void simple_send_token(int f,int token, > +/* non-compressing send token. returns size of data sent. */ > +static int simple_send_token(int f,int token, > struct map_struct *buf,OFF_T offset,int n) > { > extern int write_batch; /* dw */ > int hold_int; /* dw */ > + int l = 0; > > if (n > 0) { > - int l = 0; > while (l < n) { > int n1 = MIN(CHUNK_SIZE,n-l); > - write_int(f,n1); > - write_buf(f,map_ptr(buf,offset+l,n1),n1); > + rprintf(FINFO, "#ET#>simple_send_token will do write_buf(f,map_ptr(buf,%.0f+l,%.0f),%.0f)\n",(double)offset,(double)n1,(double)n1); > + map_ptr(buf,offset+l,n1); > + if (buf->m_len==0) > + break; > + write_int(f,buf->m_len); > + write_buf(f,buf->m_ptr,buf->m_len); > if (write_batch) { > write_batch_delta_file( (char *) &n1, sizeof(int) ); > - write_batch_delta_file(map_ptr(buf,offset+l,n1),n1); > + write_batch_delta_file(buf->m_ptr,buf->m_len); > } > - l += n1; > - } > + l += buf->m_len; > + > + }; > } > /* a -2 token means to send data only and no token */ > if (token != -2) { > write_int(f,-(token+1)); > @@ -113,8 +118,10 @@ > hold_int = -(token+1); > write_batch_delta_file( (char *) &hold_int, sizeof(int) ); > } > } > + > + return l; > } > > > /* Flag bytes in compressed stream are encoded as follows: */ > @@ -137,17 +144,18 @@ > > /* Output buffer */ > static char *obuf; > > -/* Send a deflated token */ > -static void > +/* Send a deflated token. Returns size of data sent (before compression). */ > +static int > send_deflated_token(int f, int token, > struct map_struct *buf, OFF_T offset, int nb, int toklen) > { > int n, r; > static int init_done, flush_pending; > extern int write_batch; /* dw */ > char temp_byte; /* dw */ > + int data_read = 0; > > if (last_token == -1) { > /* initialization */ > if (!init_done) { > @@ -215,13 +223,15 @@ > do { > if (tx_strm.avail_in == 0 && nb != 0) { > /* give it some more input */ > n = MIN(nb, CHUNK_SIZE); > - tx_strm.next_in = (Bytef *) > - map_ptr(buf, offset, n); > - tx_strm.avail_in = n; > - nb -= n; > - offset += n; > + > + map_ptr(buf, offset, n); > + tx_strm.next_in = (Bytef *)buf->m_ptr; > + tx_strm.avail_in = buf->m_len; > + nb -= buf->m_len; > + offset += buf->m_len; > + data_read += buf->m_len; > } > if (tx_strm.avail_out == 0) { > tx_strm.next_out = (Bytef *)(obuf + 2); > tx_strm.avail_out = MAX_DATA_COUNT; > @@ -276,10 +286,12 @@ > > } else if (token != -2) { > /* add the data in the current block to the compressor's > history and hash table */ > - tx_strm.next_in = (Bytef *) map_ptr(buf, offset, toklen); > - tx_strm.avail_in = toklen; > + map_ptr(buf, offset, toklen); > + data_read += buf->m_len; > + tx_strm.next_in = (Bytef *) buf->m_len; > + tx_strm.avail_in = buf->m_len; > tx_strm.next_out = (Bytef *) obuf; > tx_strm.avail_out = MAX_DATA_COUNT; > r = deflate(&tx_strm, Z_INSERT_ONLY); > if (r != Z_OK || tx_strm.avail_in != 0) { > @@ -287,8 +299,9 @@ > r, tx_strm.avail_in); > exit_cleanup(RERR_STREAMIO); > } > } > + return data_read; > } > > > /* tells us what the receiver is in the middle of doing */ > @@ -477,16 +490,17 @@ > /* > * transmit a verbatim buffer of length n followed by a token > * If token == -1 then we have reached EOF > * If n == 0 then don't send a buffer > + * Returns number of data bytes sent. > */ > -void send_token(int f,int token,struct map_struct *buf,OFF_T offset, > +int send_token(int f,int token,struct map_struct *buf,OFF_T offset, > int n,int toklen) > { > if (!do_compression) { > - simple_send_token(f,token,buf,offset,n); > + return simple_send_token(f,token,buf,offset,n); > } else { > - send_deflated_token(f, token, buf, offset, n, toklen); > + return send_deflated_token(f, token, buf, offset, n, toklen); > } > } > >