Ben Bolker
2012-Jan-27 17:03 UTC
[Rd] misfeature: forced file.copy() of a file over itself truncates the file ...
Try this: fn <- "tmp.dat" x <- 1:3 dump("x",file=fn) file.info(fn) ## 9 bytes file.copy(paste("./",fn,sep=""),fn,overwrite=TRUE) file.info(fn) ## 0 bytes (!!) Normally file.copy() checks and disallows overwriting a file with itself, but it only checks whether character string 'from' is the same as character string 'to' and not whether the copy refers to the same file by different names, so it lets this go ahead. It then creates a new file with the name of 'to' using file.create(): ?file.create? creates files with the given names if they do not already exist and truncates them if they do. This trashes the existing 'from' file (which was not detected). file.copy() then happily appends the contents of 'from' (which is now empty) to 'to' ... I don't know whether there's any simple way to fix this, or whether it's just a case of "don't do that". It might be worth mentioning in the documentation: `file.copy' will normally refuse to copy a file to itself, but in cases where the same file is referred to by different names (as in copying "/full/path/to/filename" to "filename" in the current working directory), it will truncate the file to zero. Now that I write that it really seems like a 'mis-feature'. On a Unix system I would probably compare inodes, but I don't know if there's a good system-independent way to test file identity ... $ ls -i tmp.dat 114080 tmp.dat $ ls -i /home/bolker/R/pkgs/r2jags/pkg/tests/tmp.dat 114080 /home/bolker/R/pkgs/r2jags/pkg/tests/tmp.dat Would normalizePath() work for this ... ?> normalizePath("tmp.dat")[1] "/mnt/hgfs/bolker/Documents/R/pkgs/r2jags/pkg/tests/tmp.dat" sincerely Ben Bolker
William Dunlap
2012-Jan-27 17:47 UTC
[Rd] misfeature: forced file.copy() of a file over itself truncates the file ...
Since the problem can only occur if the 'to' file exists, a check like if (normalizePath(from) == normalizePath(to)) { stop("'from' and 'to' files are the same") } (after verifying that 'to', and 'from', exist) would avoid the problem. S+ has a function, match.path, that can say if two paths refer to the same file (on Unixen compare inode and device numbers, on Windows compare the output of normalizePath), That avoids automounter/NFS problems like the following. We have a unix machine has two names, "sea-union" and "seabldlnx3201", and the /nfs directory contains both names. At the shell (on a second Linux machine) we can see they refer to the same place: % pwd /nfs/sea-union % ls -id usr /nfs/seabldlnx3201/usr /nfs/sea-union/usr 358337 /nfs/seabldlnx3201/usr/ 358337 /nfs/sea-union/usr/ 358337 usr/ % df usr /nfs/seabldlnx3201/usr /nfs/sea-union/usr Filesystem 1K-blocks Used Available Use% Mounted on sea-union:/usr 15385888 3526656 11077664 25% /nfs/sea-union/usr seabldlnx3201:/usr 15385888 3526656 11077664 25% /nfs/seabldlnx3201/usr sea-union:/usr 15385888 3526656 11077664 25% /nfs/sea-union/usr S+'s match.path also indicates that they are the same S+> getwd() [1] "/nfs/sea-union" S+> match.path( c("usr", "/nfs/seabldlnx3201/usr"), "/nfs/sea-union/usr") [1] 1 1 (The last indicates that both paths in the first argument match the path in the second, as match() does for strings.) But R's normalizePath() would lead you to think that they are different directories > getwd() [1] "/nfs/sea-union" > normalizePath(c("usr", "/nfs/seabldlnx3201/usr", "/nfs/sea-union/usr")) [1] "/nfs/sea-union/usr" "/nfs/seabldlnx3201/usr" "/nfs/sea-union/usr" Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Ben Bolker > Sent: Friday, January 27, 2012 9:03 AM > To: r-devel at r-project.org > Subject: [Rd] misfeature: forced file.copy() of a file over itself truncates the file ... > > > Try this: > > fn <- "tmp.dat" > x <- 1:3 > dump("x",file=fn) > file.info(fn) ## 9 bytes > file.copy(paste("./",fn,sep=""),fn,overwrite=TRUE) > file.info(fn) ## 0 bytes (!!) > > Normally file.copy() checks and disallows overwriting a file with > itself, but it only checks whether character string 'from' is the same > as character string 'to' and not whether the copy refers to the same > file by different names, so it lets this go ahead. It then creates a > new file with the name of 'to' using file.create(): > > 'file.create' creates files with the given names if they do not > already exist and truncates them if they do. > > This trashes the existing 'from' file (which was not detected). > file.copy() then happily appends the contents of 'from' (which is now > empty) to 'to' ... > > I don't know whether there's any simple way to fix this, or whether > it's just a case of "don't do that". It might be worth mentioning in > the documentation: > > `file.copy' will normally refuse to copy a file to itself, but in > cases where the same file is referred to by different names (as in > copying "/full/path/to/filename" to "filename" in the current working > directory), it will truncate the file to zero. > > Now that I write that it really seems like a 'mis-feature'. > On a Unix system I would probably compare inodes, but I don't know if > there's a good system-independent way to test file identity ... > > $ ls -i tmp.dat > 114080 tmp.dat > $ ls -i /home/bolker/R/pkgs/r2jags/pkg/tests/tmp.dat > 114080 /home/bolker/R/pkgs/r2jags/pkg/tests/tmp.dat > > Would normalizePath() work for this ... ? > > > normalizePath("tmp.dat") > [1] "/mnt/hgfs/bolker/Documents/R/pkgs/r2jags/pkg/tests/tmp.dat" > > sincerely > Ben Bolker > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Ben Bolker
2012-Jan-31 19:16 UTC
[Rd] misfeature: forced file.copy() of a file over itself truncates the file ...
Ben Bolker <bbolker <at> gmail.com> writes:>Bump. Will I be scolded if I submit this as a bug report/wishlist item? Test case:> fn <- "tmp.dat" > x <- 1:3 > dump("x",file=fn) > file.info(fn) ## 9 bytes > file.copy(paste("./",fn,sep=""),fn,overwrite=TRUE) > file.info(fn) ## 0 bytes (!!) > > Normally file.copy() checks and disallows overwriting a file with > itself, but it only checks whether character string 'from' is the same > as character string 'to' and not whether the copy refers to the same > file by different names, so it lets this go ahead. It then creates a > new file with the name of 'to' using file.create(): > > ?file.create? creates files with the given names if they do not > already exist and truncates them if they do. > > This trashes the existing 'from' file (which was not detected). > file.copy() then happily appends the contents of 'from' (which is now > empty) to 'to' ... >[snip] My proposed fix (thanks to W. Dunlap) is to use normalizePath(); as he points out, this won't catch situations where the same file can be referred to via an NFS mount, but it should help at least. Writing a platform-independent version a la S-PLUS's match.path() seemed to much work at the moment. ==================================================================--- files.R (revision 58240) +++ files.R (working copy) @@ -116,7 +116,7 @@ if(nt > nf) from <- rep(from, length.out = nt) okay <- file.exists(from) if (!overwrite) okay[file.exists(to)] <- FALSE - if (any(from[okay] %in% to[okay])) + if (any(normalizePath(from[okay]) %in% normalizePath(to[okay]))) stop("file can not be copied both 'from' and 'to'") if (any(okay)) { # care: file.create could fail but file.append work. okay[okay] <- file.create(to[okay]) thanks Ben Bolker