In a number of places internal to R, we need to know which files have changed (e.g. after building a vignette). I've just written a general purpose function "changedFiles" that I'll probably commit to R-devel. Comments on the design (or bug reports) would be appreciated. The source for the function and the Rd page for it are inline below. ----- changedFiles.R: changedFiles <- function(snapshot, timestamp = tempfile("timestamp"), file.info = NULL, md5sum = FALSE, full.names = FALSE, ...) { dosnapshot <- function(args) { fullnames <- do.call(list.files, c(full.names = TRUE, args)) names <- do.call(list.files, c(full.names = full.names, args)) if (isTRUE(file.info) || (is.character(file.info) && length(file.info))) { info <- file.info(fullnames) rownames(info) <- names if (isTRUE(file.info)) file.info <- c("size", "isdir", "mode", "mtime") } else info <- data.frame(row.names=names) if (md5sum) info <- data.frame(info, md5sum = tools::md5sum(fullnames)) list(info = info, timestamp = timestamp, file.info = file.info, md5sum = md5sum, full.names = full.names, args = args) } if (missing(snapshot) || !inherits(snapshot, "changedFilesSnapshot")) { if (length(timestamp) == 1) file.create(timestamp) if (missing(snapshot)) snapshot <- "." pre <- dosnapshot(list(path = snapshot, ...)) pre$pre <- pre$info pre$info <- NULL pre$wd <- getwd() class(pre) <- "changedFilesSnapshot" return(pre) } if (missing(timestamp)) timestamp <- snapshot$timestamp if (missing(file.info) || isTRUE(file.info)) file.info <- snapshot$file.info if (identical(file.info, FALSE)) file.info <- NULL if (missing(md5sum)) md5sum <- snapshot$md5sum if (missing(full.names)) full.names <- snapshot$full.names pre <- snapshot$pre savewd <- getwd() on.exit(setwd(savewd)) setwd(snapshot$wd) args <- snapshot$args newargs <- list(...) args[names(newargs)] <- newargs post <- dosnapshot(args)$info prenames <- rownames(pre) postnames <- rownames(post) added <- setdiff(postnames, prenames) deleted <- setdiff(prenames, postnames) common <- intersect(prenames, postnames) if (length(file.info)) { preinfo <- pre[common, file.info] postinfo <- post[common, file.info] changes <- preinfo != postinfo } else changes <- matrix(logical(0), nrow = length(common), ncol = 0, dimnames = list(common, character(0))) if (length(timestamp)) changes <- cbind(changes, Newer = file_test("-nt", common, timestamp)) if (md5sum) { premd5 <- pre[common, "md5sum"] postmd5 <- post[common, "md5sum"] changes <- cbind(changes, md5sum = premd5 != postmd5) } changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop = FALSE] changed <- rownames(changes1) structure(list(added = added, deleted = deleted, changed = changed, unchanged = setdiff(common, changed), changes = changes), class = "changedFiles") } print.changedFilesSnapshot <- function(x, ...) { cat("changedFiles snapshot:\n timestamp = \"", x$timestamp, "\"\n file.info = ", if (length(x$file.info)) paste(paste0('"', x$file.info, '"'), collapse=","), "\n md5sum = ", x$md5sum, "\n args = ", deparse(x$args, control = NULL), "\n", sep="") x } print.changedFiles <- function(x, ...) { if (length(x$added)) cat("Files added:\n", paste0(" ", x$added, collapse="\n"), "\n", sep="") if (length(x$deleted)) cat("Files deleted:\n", paste0(" ", x$deleted, collapse="\n"), "\n", sep="") changes <- x$changes changes <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop=FALSE] changes <- changes[, colSums(changes, na.rm = TRUE) > 0, drop=FALSE] if (nrow(changes)) { cat("Files changed:\n") print(changes) } x } ---------------------- --- changedFiles.Rd: \name{changedFiles} \alias{changedFiles} \alias{print.changedFiles} \alias{print.changedFilesSnapshot} \title{ Detect which files have changed } \description{ On the first call, \code{changedFiles} takes a snapshot of a selection of files. In subsequent calls, it takes another snapshot, and returns an object containing data on the differences between the two snapshots. The snapshots need not be the same directory; this could be used to compare two directories. } \usage{ changedFiles(snapshot, timestamp = tempfile("timestamp"), file.info = NULL, md5sum = FALSE, full.names = FALSE, ...) } \arguments{ \item{snapshot}{ The path to record, or a previous snapshot. See the Details. } \item{timestamp}{ The name of a file to write at the time the initial snapshot is taken. In subsequent calls, modification times of files will be compared to this file, and newer files will be reported as changed. Set to \code{NULL} to skip this test. } \item{file.info}{ A vector of columns from the result of the \code{file.info} function, or a logical value. If \code{TRUE}, columns \code{c("size", "isdir", "mode", "mtime")} will be used. Set to \code{FALSE} or \code{NULL} to skip this test. See the Details. } \item{md5sum}{ A logical value indicating whether MD5 summaries should be taken as part of the snapshot. } \item{full.names}{ A logical value indicating whether full names (as in \code{\link{list.files}}) should be recorded. } \item{\dots}{ Additional parameters to pass to \code{\link{list.files}} to control the set of files in the snapshots. } } \details{ This function works in two modes. If the \code{snapshot} argument is missing or is not of S3 class \code{"changedFilesSnapshot"}, it is used as the \code{path} argument to \code{\link{list.files}} to obtain a list of files. If it is of class \code{"changedFilesSnapshot"}, then it is taken to be the baseline file and a new snapshot is taken and compared with it. In the latter case, missing arguments default to match those from the initial snapshot. If the \code{timestamp} argument is length 1, a file with that name is created in the current directory during the initial snapshot, and \code{\link{file_test}} is used to compare the age of all files to it during subsequent calls. If the \code{file.info} argument is \code{TRUE} or it contains a non-empty character vector, the indicated columns from the result of a call to \code{\link{file.info}} will be recorded and compared. If \code{md5sum} is \code{TRUE}, the \code{tools::\link{md5sum}} function will be called to record the 32 byte MD5 checksum for each file, and these values will be compared. } \value{ In the initial snapshot phase, an object of class \code{"changedFilesSnapshot"} is returned. This is a list containing the fields \item{pre}{a dataframe whose rownames are the filenames, and whose columns contain the requested snapshot data} \item{timestamp, file.info, md5sum, full.names}{a record of the arguments in the initial call} \item{args}{other arguments passed via \code{...} to \code{\link{list.files}}.} In the comparison phase, an object of class \code{"changedFiles"}. This is a list containing \item{added, deleted, changed, unchanged}{character vectors of filenames from the before and after snapshots, with obvious meanings} \item{changes}{a logical matrix with a row for each common file, and a column for each comparison test. \code{TRUE} indicates a change in that test.} \code{\link{print}} methods are defined for each of these types. The \code{\link{print}} method for \code{"changedFilesSnapshot"} objects displays the arguments used to produce it, while the one for \code{"changedFiles"} displays the \code{added}, \code{deleted} and \code{changed} fields if non-empty, and a submatrix of the \code{changes} matrix containing all of the \code{TRUE} values. } \author{ Duncan Murdoch } \seealso{ \code{\link{file.info}}, \code{\link{file_test}}, \code{\link{md5sum}}. } \examples{ # Create some files in a temporary directory dir <- tempfile() dir.create(dir) writeBin(1, file.path(dir, "file1")) writeBin(2, file.path(dir, "file2")) dir.create(file.path(dir, "dir")) # Take a snapshot snapshot <- changedFiles(dir, file.info=TRUE, md5sum=TRUE) # Change one of the files writeBin(3, file.path(dir, "file2")) # Display the detected changes changedFiles(snapshot) changedFiles(snapshot)$changes } \keyword{utilities} \keyword{file}
Hi Duncan, I think this functionality would be much easier to use and understand if you split it up the functionality of taking snapshots and comparing them into separate functions. In addition, the 'timestamp' functionality seems both confusing and brittle to me. I think it would be better to store file modification times in the snapshot and use those instead of an external file. Maybe: # Take a snapshot of the files. takeFileSnapshot(directory, file.info = TRUE, md5sum = FALSE, full.names FALSE, recursive = TRUE, ...) # Take a snapshot using the same options as used for snapshot. retakeFileSnapshot(snapshot, directory = snapshot$directory) { takeFileSnapshot)(directory, file.info = snapshot$file.info, md5sum snapshot$md5sum, etc) } compareFileSnapshots(snapshot1, snapshot2) - or - getNewFiles(snapshat1, snapshot2) # These names are probably too generic getDeletedFiles(snapshot1, snapshot2) getUpdatedFiles(snapshot1, snapshot2) -or- setdiff(snapshot1, snapshot2) # Unclear how this should treat updated files This approach does have the difficulty that users could attempt to compare snapshots that were taken with different options and that can't be compared, but that should be an easy error to detect. Karl On Wed, Sep 4, 2013 at 10:53 AM, Duncan Murdoch <murdoch.duncan@gmail.com>wrote:> In a number of places internal to R, we need to know which files have > changed (e.g. after building a vignette). I've just written a general > purpose function "changedFiles" that I'll probably commit to R-devel. > Comments on the design (or bug reports) would be appreciated. > > The source for the function and the Rd page for it are inline below. > > ----- changedFiles.R: > changedFiles <- function(snapshot, timestamp = tempfile("timestamp"), > file.info = NULL, > md5sum = FALSE, full.names = FALSE, ...) { > dosnapshot <- function(args) { > fullnames <- do.call(list.files, c(full.names = TRUE, args)) > names <- do.call(list.files, c(full.names = full.names, args)) > if (isTRUE(file.info) || (is.character(file.info) && length( > file.info))) { > info <- file.info(fullnames) > rownames(info) <- names > if (isTRUE(file.info)) > file.info <- c("size", "isdir", "mode", "mtime") > } else > info <- data.frame(row.names=names) > if (md5sum) > info <- data.frame(info, md5sum = tools::md5sum(fullnames)) > list(info = info, timestamp = timestamp, file.info = file.info, > md5sum = md5sum, full.names = full.names, args = args) > } > if (missing(snapshot) || !inherits(snapshot, "changedFilesSnapshot")) { > if (length(timestamp) == 1) > file.create(timestamp) > if (missing(snapshot)) snapshot <- "." > pre <- dosnapshot(list(path = snapshot, ...)) > pre$pre <- pre$info > pre$info <- NULL > pre$wd <- getwd() > class(pre) <- "changedFilesSnapshot" > return(pre) > } > > if (missing(timestamp)) timestamp <- snapshot$timestamp > if (missing(file.info) || isTRUE(file.info)) file.info <- snapshot$ > file.info > if (identical(file.info, FALSE)) file.info <- NULL > if (missing(md5sum)) md5sum <- snapshot$md5sum > if (missing(full.names)) full.names <- snapshot$full.names > > pre <- snapshot$pre > savewd <- getwd() > on.exit(setwd(savewd)) > setwd(snapshot$wd) > > args <- snapshot$args > newargs <- list(...) > args[names(newargs)] <- newargs > post <- dosnapshot(args)$info > prenames <- rownames(pre) > postnames <- rownames(post) > > added <- setdiff(postnames, prenames) > deleted <- setdiff(prenames, postnames) > common <- intersect(prenames, postnames) > > if (length(file.info)) { > preinfo <- pre[common, file.info] > postinfo <- post[common, file.info] > changes <- preinfo != postinfo > } > else changes <- matrix(logical(0), nrow = length(common), ncol = 0, > dimnames = list(common, character(0))) > if (length(timestamp)) > changes <- cbind(changes, Newer = file_test("-nt", common, > timestamp)) > if (md5sum) { > premd5 <- pre[common, "md5sum"] > postmd5 <- post[common, "md5sum"] > changes <- cbind(changes, md5sum = premd5 != postmd5) > } > changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop = FALSE] > changed <- rownames(changes1) > structure(list(added = added, deleted = deleted, changed = changed, > unchanged = setdiff(common, changed), changes = changes), class > "changedFiles") > } > > print.changedFilesSnapshot <- function(x, ...) { > cat("changedFiles snapshot:\n timestamp = \"", x$timestamp, "\"\n > file.info = ", > if (length(x$file.info)) paste(paste0('"', x$file.info, '"'), > collapse=","), > "\n md5sum = ", x$md5sum, "\n args = ", deparse(x$args, control > NULL), "\n", sep="") > x > } > > print.changedFiles <- function(x, ...) { > if (length(x$added)) cat("Files added:\n", paste0(" ", x$added, > collapse="\n"), "\n", sep="") > if (length(x$deleted)) cat("Files deleted:\n", paste0(" ", > x$deleted, collapse="\n"), "\n", sep="") > changes <- x$changes > changes <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop=FALSE] > changes <- changes[, colSums(changes, na.rm = TRUE) > 0, drop=FALSE] > if (nrow(changes)) { > cat("Files changed:\n") > print(changes) > } > x > } > ---------------------- > > --- changedFiles.Rd: > \name{changedFiles} > \alias{changedFiles} > \alias{print.changedFiles} > \alias{print.**changedFilesSnapshot} > \title{ > Detect which files have changed > } > \description{ > On the first call, \code{changedFiles} takes a snapshot of a selection of > files. In subsequent > calls, it takes another snapshot, and returns an object containing data on > the > differences between the two snapshots. The snapshots need not be the same > directory; > this could be used to compare two directories. > } > \usage{ > changedFiles(snapshot, timestamp = tempfile("timestamp"), file.info > NULL, > md5sum = FALSE, full.names = FALSE, ...) > } > \arguments{ > \item{snapshot}{ > The path to record, or a previous snapshot. See the Details. > } > \item{timestamp}{ > The name of a file to write at the time the initial snapshot > is taken. In subsequent calls, modification times of files will be > compared to > this file, and newer files will be reported as changed. Set to \code{NULL} > to skip this test. > } > \item{file.info}{ > A vector of columns from the result of the \code{file.info} function, or > a logical value. If > \code{TRUE}, columns \code{c("size", "isdir", "mode", "mtime")} will be > used. Set to > \code{FALSE} or \code{NULL} to skip this test. See the Details. > } > \item{md5sum}{ > A logical value indicating whether MD5 summaries should be taken as part > of the snapshot. > } > \item{full.names}{ > A logical value indicating whether full names (as in > \code{\link{list.files}}) should be > recorded. > } > \item{\dots}{ > Additional parameters to pass to \code{\link{list.files}} to control the > set of files > in the snapshots. > } > } > \details{ > This function works in two modes. If the \code{snapshot} argument is > missing or is > not of S3 class \code{"changedFilesSnapshot"}, it is used as the > \code{path} argument > to \code{\link{list.files}} to obtain a list of files. If it is of class > \code{"changedFilesSnapshot"}, then it is taken to be the baseline file > and a new snapshot is taken and compared with it. In the latter case, > missing > arguments default to match those from the initial snapshot. > > If the \code{timestamp} argument is length 1, a file with that name is > created > in the current directory during the initial snapshot, and > \code{\link{file_test}} > is used to compare the age of all files to it during subsequent calls. > > If the \code{file.info} argument is \code{TRUE} or it contains a non-empty > character vector, the indicated columns from the result of a call to > \code{\link{file.info}} will be recorded and compared. > > If \code{md5sum} is \code{TRUE}, the \code{tools::\link{md5sum}} function > will be called to record the 32 byte MD5 checksum for each file, and these > values > will be compared. > } > \value{ > In the initial snapshot phase, an object of class > \code{"changedFilesSnapshot"} is returned. This > is a list containing the fields > \item{pre}{a dataframe whose rownames are the filenames, and whose columns > contain the > requested snapshot data} > \item{timestamp, file.info, md5sum, full.names}{a record of the arguments > in the initial call} > \item{args}{other arguments passed via \code{...} to > \code{\link{list.files}}.} > > In the comparison phase, an object of class \code{"changedFiles"}. This is > a list containing > \item{added, deleted, changed, unchanged}{character vectors of filenames > from the before > and after snapshots, with obvious meanings} > \item{changes}{a logical matrix with a row for each common file, and a > column for each > comparison test. \code{TRUE} indicates a change in that test.} > > \code{\link{print}} methods are defined for each of these types. The > \code{\link{print}} method for \code{"changedFilesSnapshot"} objects > displays the arguments used to produce it, while the one for > \code{"changedFiles"} displays the \code{added}, \code{deleted} > and \code{changed} fields if non-empty, and a submatrix of the > \code{changes} > matrix containing all of the \code{TRUE} values. > } > \author{ > Duncan Murdoch > } > \seealso{ > \code{\link{file.info}}, \code{\link{file_test}}, \code{\link{md5sum}}. > } > \examples{ > # Create some files in a temporary directory > dir <- tempfile() > dir.create(dir) > writeBin(1, file.path(dir, "file1")) > writeBin(2, file.path(dir, "file2")) > dir.create(file.path(dir, "dir")) > > # Take a snapshot > snapshot <- changedFiles(dir, file.info=TRUE, md5sum=TRUE) > > # Change one of the files > writeBin(3, file.path(dir, "file2")) > > # Display the detected changes > changedFiles(snapshot) > changedFiles(snapshot)$changes > } > \keyword{utilities} > \keyword{file} > > ______________________________**________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-devel<https://stat.ethz.ch/mailman/listinfo/r-devel> >[[alternative HTML version deleted]]
On Wed, Sep 4, 2013 at 1:53 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> In a number of places internal to R, we need to know which files have > changed (e.g. after building a vignette). I've just written a general > purpose function "changedFiles" that I'll probably commit to R-devel. > Comments on the design (or bug reports) would be appreciated. > > The source for the function and the Rd page for it are inline below.This looks like a useful function. Thanks for writing it. I have only one (picky) comment below.> ----- changedFiles.R: > changedFiles <- function(snapshot, timestamp = tempfile("timestamp"), > file.info = NULL, > md5sum = FALSE, full.names = FALSE, ...) { > dosnapshot <- function(args) { > fullnames <- do.call(list.files, c(full.names = TRUE, args)) > names <- do.call(list.files, c(full.names = full.names, args)) > if (isTRUE(file.info) || (is.character(file.info) && > length(file.info))) { > info <- file.info(fullnames) > rownames(info) <- names > if (isTRUE(file.info)) > file.info <- c("size", "isdir", "mode", "mtime") > } else > info <- data.frame(row.names=names) > if (md5sum) > info <- data.frame(info, md5sum = tools::md5sum(fullnames)) > list(info = info, timestamp = timestamp, file.info = file.info, > md5sum = md5sum, full.names = full.names, args = args) > } > if (missing(snapshot) || !inherits(snapshot, "changedFilesSnapshot")) { > if (length(timestamp) == 1) > file.create(timestamp) > if (missing(snapshot)) snapshot <- "." > pre <- dosnapshot(list(path = snapshot, ...)) > pre$pre <- pre$info > pre$info <- NULL > pre$wd <- getwd() > class(pre) <- "changedFilesSnapshot" > return(pre) > } > > if (missing(timestamp)) timestamp <- snapshot$timestamp > if (missing(file.info) || isTRUE(file.info)) file.info <- > snapshot$file.info > if (identical(file.info, FALSE)) file.info <- NULL > if (missing(md5sum)) md5sum <- snapshot$md5sum > if (missing(full.names)) full.names <- snapshot$full.names > > pre <- snapshot$pre > savewd <- getwd() > on.exit(setwd(savewd)) > setwd(snapshot$wd) > > args <- snapshot$args > newargs <- list(...) > args[names(newargs)] <- newargs > post <- dosnapshot(args)$info > prenames <- rownames(pre) > postnames <- rownames(post) > > added <- setdiff(postnames, prenames) > deleted <- setdiff(prenames, postnames) > common <- intersect(prenames, postnames) > > if (length(file.info)) { > preinfo <- pre[common, file.info] > postinfo <- post[common, file.info] > changes <- preinfo != postinfo > } > else changes <- matrix(logical(0), nrow = length(common), ncol = 0, > dimnames = list(common, character(0))) > if (length(timestamp)) > changes <- cbind(changes, Newer = file_test("-nt", common, > timestamp)) > if (md5sum) { > premd5 <- pre[common, "md5sum"] > postmd5 <- post[common, "md5sum"] > changes <- cbind(changes, md5sum = premd5 != postmd5) > } > changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop = FALSE] > changed <- rownames(changes1) > structure(list(added = added, deleted = deleted, changed = changed, > unchanged = setdiff(common, changed), changes = changes), class > "changedFiles") > } > > print.changedFilesSnapshot <- function(x, ...) { > cat("changedFiles snapshot:\n timestamp = \"", x$timestamp, "\"\n > file.info = ", > if (length(x$file.info)) paste(paste0('"', x$file.info, '"'), > collapse=","), > "\n md5sum = ", x$md5sum, "\n args = ", deparse(x$args, control > NULL), "\n", sep="") > x > } > > print.changedFiles <- function(x, ...) { > if (length(x$added)) cat("Files added:\n", paste0(" ", x$added, > collapse="\n"), "\n", sep="") > if (length(x$deleted)) cat("Files deleted:\n", paste0(" ", x$deleted, > collapse="\n"), "\n", sep="") > changes <- x$changes > changes <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop=FALSE] > changes <- changes[, colSums(changes, na.rm = TRUE) > 0, drop=FALSE] > if (nrow(changes)) { > cat("Files changed:\n") > print(changes) > } > x > } > ---------------------- > > --- changedFiles.Rd: > \name{changedFiles} > \alias{changedFiles} > \alias{print.changedFiles} > \alias{print.changedFilesSnapshot} > \title{ > Detect which files have changed > } > \description{ > On the first call, \code{changedFiles} takes a snapshot of a selection of > files. In subsequent > calls, it takes another snapshot, and returns an object containing data on > the > differences between the two snapshots. The snapshots need not be the same > directory; > this could be used to compare two directories. > } > \usage{ > changedFiles(snapshot, timestamp = tempfile("timestamp"), file.info = NULL, > md5sum = FALSE, full.names = FALSE, ...) > } > \arguments{ > \item{snapshot}{ > The path to record, or a previous snapshot. See the Details. > } > \item{timestamp}{ > The name of a file to write at the time the initial snapshot > is taken. In subsequent calls, modification times of files will be compared > to > this file, and newer files will be reported as changed. Set to \code{NULL} > to skip this test. > } > \item{file.info}{ > A vector of columns from the result of the \code{file.info} function, or a > logical value. If > \code{TRUE}, columns \code{c("size", "isdir", "mode", "mtime")} will be > used. Set to > \code{FALSE} or \code{NULL} to skip this test. See the Details. > } > \item{md5sum}{ > A logical value indicating whether MD5 summaries should be taken as part of > the snapshot. > } > \item{full.names}{ > A logical value indicating whether full names (as in > \code{\link{list.files}}) should be > recorded. > } > \item{\dots}{ > Additional parameters to pass to \code{\link{list.files}} to control the set > of files > in the snapshots. > } > } > \details{ > This function works in two modes. If the \code{snapshot} argument is > missing or is > not of S3 class \code{"changedFilesSnapshot"}, it is used as the \code{path} > argument > to \code{\link{list.files}} to obtain a list of files. If it is of class > \code{"changedFilesSnapshot"}, then it is taken to be the baseline file > and a new snapshot is taken and compared with it. In the latter case, > missing > arguments default to match those from the initial snapshot. > > If the \code{timestamp} argument is length 1, a file with that name is > created > in the current directory during the initial snapshot, and > \code{\link{file_test}} > is used to compare the age of all files to it during subsequent calls. > > If the \code{file.info} argument is \code{TRUE} or it contains a non-empty > character vector, the indicated columns from the result of a call to > \code{\link{file.info}} will be recorded and compared. > > If \code{md5sum} is \code{TRUE}, the \code{tools::\link{md5sum}} function > will be called to record the 32 byte MD5 checksum for each file, and these > values > will be compared. > } > \value{ > In the initial snapshot phase, an object of class > \code{"changedFilesSnapshot"} is returned. This > is a list containing the fields > \item{pre}{a dataframe whose rownames are the filenames, and whose columns > contain the > requested snapshot data} > \item{timestamp, file.info, md5sum, full.names}{a record of the arguments in > the initial call} > \item{args}{other arguments passed via \code{...} to > \code{\link{list.files}}.} > > In the comparison phase, an object of class \code{"changedFiles"}. This is a > list containing > \item{added, deleted, changed, unchanged}{character vectors of filenames > from the before > and after snapshots, with obvious meanings} > \item{changes}{a logical matrix with a row for each common file, and a > column for each > comparison test. \code{TRUE} indicates a change in that test.} > > \code{\link{print}} methods are defined for each of these types. The > \code{\link{print}} method for \code{"changedFilesSnapshot"} objects > displays the arguments used to produce it, while the one for > \code{"changedFiles"} displays the \code{added}, \code{deleted} > and \code{changed} fields if non-empty, and a submatrix of the > \code{changes} > matrix containing all of the \code{TRUE} values. > } > \author{ > Duncan Murdoch > } > \seealso{ > \code{\link{file.info}}, \code{\link{file_test}}, \code{\link{md5sum}}. > } > \examples{ > # Create some files in a temporary directory > dir <- tempfile() > dir.create(dir)Should a different name than 'dir' be used since 'dir' is a base function? Further, if someone is not very familiar with R (or just not in "R mode" at the time of reading), they might think that 'dir.create' is calling the create member of the object named 'dir' that you just made. Scott> writeBin(1, file.path(dir, "file1")) > writeBin(2, file.path(dir, "file2")) > dir.create(file.path(dir, "dir")) > > # Take a snapshot > snapshot <- changedFiles(dir, file.info=TRUE, md5sum=TRUE) > > # Change one of the files > writeBin(3, file.path(dir, "file2")) > > # Display the detected changes > changedFiles(snapshot) > changedFiles(snapshot)$changes > } > \keyword{utilities} > \keyword{file} > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Scott Kostyshak Economics PhD Candidate Princeton University
Dr Gregory Jefferis
2013-Sep-05 16:32 UTC
[Rd] Comments requested on "changedFiles" function
Dear Duncan, This certainly looks useful. Might you consider adding the ability to supply an alternative digest function? Details below. I often use a homemade "make" type function which starts by looking at modification times e.g. in a private package https://github.com/jefferis/nat.utils/blob/master/R/make.r For some of my work, I use hash functions. However because I typically work with many large files I often use a special digest process e.g. using the crc checksum embedded in a gzip file directly or hashing only the part of a large file that is (almost) certain to change. Perhaps (code unchecked) along the lines of: changedFiles <- function(snapshot, timestamp = tempfile("timestamp"), file.info = NULL, digest = FALSE, digestfun=NULL, full.names = FALSE, ...) if(digest){ if(is.null(digestfun)) digestfun=tools::md5sum else digestfun=match.fun(digestfun) info <- data.frame(info, digest = digestfun(fullnames)) } etc OR alternatively using only one argument: changedFiles <- function(snapshot, timestamp = tempfile("timestamp"), file.info = NULL, digest = FALSE, full.names = FALSE, ...) if(is.logical(digest)){ if(digest) digestfun=tools::md5sum } else { # Assume that digest specifies a function that we want to use digestfun=match.fun(digest) digest=TRUE } if(digest) info <- data.frame(info, digest = digestfun(fullnames)) etc Many thanks, Greg. On 4 Sep 2013, at 18:53, Duncan Murdoch wrote:> In a number of places internal to R, we need to know which files have > changed (e.g. after building a vignette). I've just written a general > purpose function "changedFiles" that I'll probably commit to R-devel. > Comments on the design (or bug reports) would be appreciated. > > The source for the function and the Rd page for it are inline below. > > ----- changedFiles.R: > changedFiles <- function(snapshot, timestamp = tempfile("timestamp"), > file.info = NULL, > md5sum = FALSE, full.names = FALSE, ...) { > dosnapshot <- function(args) { > fullnames <- do.call(list.files, c(full.names = TRUE, args)) > names <- do.call(list.files, c(full.names = full.names, args)) > if (isTRUE(file.info) || (is.character(file.info) && > length(file.info))) { > info <- file.info(fullnames) > rownames(info) <- names > if (isTRUE(file.info)) > file.info <- c("size", "isdir", "mode", "mtime") > } else > info <- data.frame(row.names=names) > if (md5sum) > info <- data.frame(info, md5sum = tools::md5sum(fullnames)) > list(info = info, timestamp = timestamp, file.info = file.info, > md5sum = md5sum, full.names = full.names, args = args)-- Gregory Jefferis, PhD Tel: 01223 267048 Division of Neurobiology MRC Laboratory of Molecular Biology Francis Crick Avenue Cambridge Biomedical Campus Cambridge, CB2 OQH, UK http://www2.mrc-lmb.cam.ac.uk/group-leaders/h-to-m/g-jefferis http://jefferislab.org http://flybrain.stanford.edu
Thanks for everyone's comments on this. I have now committed a version to R-devel. I don't plan to backport it to 3.0.2 (coming out in a couple of weeks), but it should appear in 3.1.0 in the spring, and it's conceivable it could make it into 3.0.3 (not yet scheduled). Duncan Murdoch