Scott Kostyshak
2013-Sep-09 07:00 UTC
[Rd] tools::md5sum(directory) behavior different on Windows vs. Unix
tools::md5sum gives a warning if it receives a directory as an argument on Unix but not on Windows.>From what I understand, this happens because in Windows a directory isnot treated as a file so fopen returns NULL. Then, NA is returned without a warning. On Unix, a directory is treated as a file so fopen does not return NULL so md5 is run and fails, leading to a warning. This is a good opportunity for me to understand further (in addition to [1] and the many places where OS special cases are mentioned) in which cases R tries to behave the same on Windows as on Unix and in which cases it allows for differences (in this case, a warning vs. no warning). For example, it would be straightforward to create a patch that would lead to the same behavior in this case. tools::md5sum could either issue a warning for each argument that is a directory or it could issue no warning (consistent with file.info). Would either patch be considered? Or is this difference encouraged because the concept of a file is different on Unix than on Windows? Scott [1] http://cran.r-project.org/bin/windows/base/rw-FAQ.html#What-should-I-expect-to-behave-differently-from-the-Unix-version -- Scott Kostyshak Economics PhD Candidate Princeton University
Scott Kostyshak
2013-Sep-29 08:16 UTC
[Rd] tools::md5sum(directory) behavior different on Windows vs. Unix
On Mon, Sep 9, 2013 at 3:00 AM, Scott Kostyshak <skostysh at princeton.edu> wrote:> tools::md5sum gives a warning if it receives a directory as an > argument on Unix but not on Windows. > > From what I understand, this happens because in Windows a directory is > not treated as a file so fopen returns NULL. Then, NA is returned > without a warning. On Unix, a directory is treated as a file so fopen > does not return NULL so md5 is run and fails, leading to a warning. > > This is a good opportunity for me to understand further (in addition > to [1] and the many places where OS special cases are mentioned) in > which cases R tries to behave the same on Windows as on Unix and in > which cases it allows for differences (in this case, a warning vs. no > warning). For example, it would be straightforward to create a patch > that would lead to the same behavior in this case. tools::md5sum could > either issue a warning for each argument that is a directory or it > could issue no warning (consistent with file.info). Would either patch > be considered?Attached is a patch that gives a warning if an element in the file argument is not a regular file (e.g. is a directory or does not exist). In my opinion the advantages of this patch are: (1) the same warnings are generated on all platforms in the case where one of the elements is a folder. (2) a warning is also given if a file does not exist. Comments? Scott> > Or is this difference encouraged because the concept of a file is > different on Unix than on Windows? > > Scott > > [1] http://cran.r-project.org/bin/windows/base/rw-FAQ.html#What-should-I-expect-to-behave-differently-from-the-Unix-version > > > -- > Scott Kostyshak > Economics PhD Candidate > Princeton University-------------- next part -------------- Index: trunk/src/library/tools/R/md5.R ==================================================================--- trunk/src/library/tools/R/md5.R (revision 64011) +++ trunk/src/library/tools/R/md5.R (working copy) @@ -17,7 +17,18 @@ # http://www.r-project.org/Licenses/ md5sum <- function(files) - structure(.Call(Rmd5, files), names=files) +{ + reg_ <- file_test("-f", files) + regFiles <- files[reg_] + notReg <- files[!reg_] + if(!all(reg_)) + warning("The following are not regular files: ", + paste(shQuote(notReg), collapse = " ")) + names(files) <- files + files[!reg_] <- NA + files[reg_] <- .Call(Rmd5, regFiles) + files +} .installMD5sums <- function(pkgDir, outDir = pkgDir) { Index: trunk/src/library/tools/man/md5sum.Rd ==================================================================--- trunk/src/library/tools/man/md5sum.Rd (revision 64011) +++ trunk/src/library/tools/man/md5sum.Rd (working copy) @@ -18,7 +18,8 @@ \value{ A character vector of the same length as \code{files}, with names equal to \code{files}. The elements - will be \code{NA} for non-existent or unreadable files, otherwise + will be \code{NA} for non-existent or unreadable files (in which case + a warning will be generated), otherwise a 32-character string of hexadecimal digits. On Windows all files are read in binary mode (as the \code{md5sum}