John R. LoVerso
2003-May-23 00:37 UTC
PATCH: better handling for write failures (disk full)
[I sent this the other day, but it never got approved for the list] I've been having problems trying to sync two small partitions (128MB) that are usually near being full. The rsync would fail with this cryptic error: rsync: writefd_unbuffered failed to write 4 bytes: phase "unknown": Broken pipe rsync error: error in rsync protocol data stream (code 12) at io.c(515) It ends up that if rsync gets a write error while transferring the file (such as is caused when you fill up a partition), it just fails. That's because the code in receive_data() just calls exit_cleanup() upon a write error, which bombs out the receiver. For whatever reason, the sender doesn't handle this gracefully, and in turn aborts. People have "worked around" this problem by using "-T", putting the received file in temp space (assuming it's big enough). That allows the receive to complete without error (avoiding the abort). Then finish_transfer() will copy the file from the temp space, overwriting the destination file using copy_file(), which usually avoids filling the destination partition. Thus, using "-T" is the only way to transfer into a nearly full partition. However, there is one other problem: if copy_file() fills up the destintation partition, it will fail. At least it gives a valid error message with "-vv" and doesn't abort the whole sync: renaming /var/tmp/.x1.OApztf to dest/x1 write dest/x1: No space left on device copy /var/tmp/.x1.OApztf -> dest/x1 : No space left on device However, in this case, it never removes the truncated destination file, leaving the destination partition full (and guaranteeing that nothing else will transfer). In my case, if it fills the small partition, I don't want the partially transferred file around. If I did, I'd have specified the "--partial" option. Here are two patches ("these work for me, YMMV"): receiver.c: upon a write error, discard the rest of the current file transfer and keep working. don't give up. do generate a useful error message. rsync.c: if using -T but not --partial, remove a partial result when a write error occurs Perhaps the changes in receive_data() could specifically just target ENOSPC, on the assumption that any other write error is fatal. I'm also using John Van Essen's write_file() patch from: http://lists.samba.org/pipermail/rsync/2003-April/010511.html John diff -Nru a/rsync/receiver.c b/rsync/receiver.c --- a/rsync/receiver.c Tue May 20 08:56:43 2003 +++ b/rsync/receiver.c Tue May 20 08:56:43 2003 @@ -214,6 +214,7 @@ static char file_sum1[MD4_SUM_LENGTH]; static char file_sum2[MD4_SUM_LENGTH]; char *map=NULL; + int discard = 0; count = read_int(f_in); n = read_int(f_in); @@ -240,7 +241,9 @@ if (fd != -1 && write_file(fd,data,i) != i) { rprintf(FERROR,"write failed on %s : %s\n",fname,strerror(errno)); - exit_cleanup(RERR_FILEIO); + discard = 1; + fd = -1; + // exit_cleanup(RERR_FILEIO); } offset += i; continue; @@ -268,7 +271,9 @@ if (fd != -1 && write_file(fd,map,len) != (int) len) { rprintf(FERROR,"write failed on %s : %s\n", fname,strerror(errno)); - exit_cleanup(RERR_FILEIO); + discard = 1; + fd = -1; + // exit_cleanup(RERR_FILEIO); } offset += len; } @@ -278,7 +283,9 @@ if (fd != -1 && offset > 0 && sparse_end(fd) != 0) { rprintf(FERROR,"write failed on %s : %s\n", fname,strerror(errno)); - exit_cleanup(RERR_FILEIO); + discard = 1; + fd = -1; + // exit_cleanup(RERR_FILEIO); } sum_end(file_sum1); @@ -293,6 +300,8 @@ return 0; } } + if (discard) + return 2; return 1; } @@ -458,6 +467,16 @@ close(fd1); } close(fd2); + + /* + * This means a write error occured, and the file is discarded + */ + if (recv_ok == 2) { + if (verbose > 2) + rprintf(FINFO,"discarding %s\n",fname); + do_unlink(fnametmp); + cleanup_disable(); + } else { if (verbose > 2) rprintf(FINFO,"renaming %s to %s\n",fnametmp,fname); @@ -476,6 +495,7 @@ write_int(f_gen,i); } } + } } if (delete_after) { diff -Nru a/rsync/rsync.c b/rsync/rsync.c --- a/rsync/rsync.c Tue May 20 08:56:43 2003 +++ b/rsync/rsync.c Tue May 20 08:56:43 2003 @@ -243,8 +243,14 @@ /* rename failed on cross-filesystem link. Copy the file instead. */ if (copy_file(fnametmp,fname, file->mode & INITACCESSPERMS)) { - rprintf(FERROR,"copy %s -> %s : %s\n", + int err = errno; + extern int keep_partial; + rprintf(FERROR,"error copy %s -> %s : %s\n", fnametmp,fname,strerror(errno)); + /* remove partial result if disk full */ + if (err == ENOSPC && !keep_partial) { + (void)unlink(fname); + } } else { set_perms(fname,file,NULL,0); }
On Thu, May 22, 2003 at 10:37:07AM -0400, John R. LoVerso wrote:> Here are two patches ("these work for me, YMMV"):What are they against? They will probably apply but there has been considerable change in cvs judging by the line-numbers.> receiver.c: upon a write error, discard the rest of the > current file transfer and keep working. don't give up. > do generate a useful error message.A useful error message is good. I'm less certain that we should proceed. We have enough problems with needing to pour over the transfer report to find the cause of the "some files could not be transferred" error message. I don't know that we want to add to it. Perhaps even more important, if one file fails due to a full filesystem it is likely others will. What you propose will result in rsync continuing a broken transfer. This kind of stumbling along is likely to progressively fill the destination where failing tends to back off leaving the admin some room to maneuver. If this is accepted there should be some sort of sanity check like --max-delete provides on deletes.> rsync.c: if using -T but not --partial, remove a partial result > when a write error occursGood idea. Separate issue i think.> Perhaps the changes in receive_data() could specifically just target > ENOSPC, on the assumption that any other write error is fatal. > > I'm also using John Van Essen's write_file() patch from: > http://lists.samba.org/pipermail/rsync/2003-April/010511.htmlThat patch has now been committed. Thanks for the reminder.