Consider the question we had recently: "how do I count the lines in a file without reading it into R?" The solution I suggested was as.numeric(system(paste("wc -l <", filename), TRUE)) Unfortunately, it doesn't work, or at least, not all the time. If you already know all about that, and don't care, or already have a solution, stop reading now. Otherwise, let me try to undo any harm I may have done by providing a fuller solution. We've had several reports in this list about problems caused by Windows file names with spaces in them. File names with spaces are also common in MacOS X, so common, in fact, that file name completion in a Terminal actually works (if you have a file name "Foo Bar", and type F, o, TAB you get Foo\ Bar). File names with spaces are possible in other Unix systems too, and always have been, though they are less likely. So suppose there is a file "Foo Bar" you want to find the size of.> file.name <- "Foo Bar" > system(paste("wc -l <", File.name)executes the command wc -l < Foo Bar which gives you the size of Bar if there is one, or fails if there is not, and ignores Foo (should there be one) and of course ignores "Foo Bar". What can we do about it? Well, we can try this: for.system <- function (s) gsub(" ", "\\\\ ", s) system(paste("wc -l <", for.system(file.name)), TRUE) Great. Works for files with spaces in their names. Now we try some other file names. (File names like this are abundant in MacOS X.) file.name <- "Black & White Minstrels/1972" Whoops. wc -l < Black\ &\ White\ Minstrels/1972 forks off "wc -l <Black\ " and then tries to run "\ White\ Minstrels/1972". file.name <- "Quake(R)/scores" Whoops. "Badly placed ()'s". file.name <- "Drunkard's walk/log-1' Whoops. "Unmatched '" So try again. for.system <- function (s) gsub("([][)(}{'\";&! \t])", "\\\\\\1", s) line.count <- function (s) as.numeric(system(paste("wc -l <", for.system(s)), TRUE)) This _still_ isn't perfect, but it is a whole lot better than the naive version. The major remaining problem is that the set of special characters and the quoting mechanism need to be changed for Windows. I _think_ the Windows version should be something like for.system <- function (s) { i <- grep("[^-_:.A-Za-z0-9/\\\\]", s) s[i] <- sapply(s[i], function (s) paste("\"", s, "\"", sep="")) s } But what if a file name contains a double quote? Until someone tells me, I'm just going to hope it doesn't happen. Putting the pieces together, f% cat >"Foo Bar" a b c d e f <EOF> for.system <- if (.Platform$OS.type == "windows") { function (s) { i <- grep("[^-_:.A-Za-z0-9/\\\\]", s) s[i] <- sapply(s[i], function (s) paste("\"", s, "\"", sep="")) s } } else { function (s) gsub("([][)(}{'\";&! \t\n])", "\\\\\\1", s) } wc <- function (s) { r <- scan(pipe(paste("wc <", for.system(s)), open="r"), n=3, quiet=TRUE) names(r) <- c("lines", "words", "chars") r }> wc("Foo Bar")lines words chars 3 6 12> system("cp $HOME/.login Drunkard\\'s\\ Walk") > wc("Drunkard's Walk")["chars"]chars 3633>If there's already something like for.system() built into R, I'd be very happy to know about it. (It's a little odd that system() and pipe() don't already support something like this; in a multi-element character vector the first could be taken literally and the remaining ones could be taken quoted with leading spaces.)
The normal way to do this is to quote the string, here filename. See ?shQuote. Your comments really are not fair: a lot of work has been put into supporting paths containing spaces on both Windows and Unix by the R developers (or warning that they are not supported), but not by users. That includes researching and writing functions like shQuote. On Thu, 9 Dec 2004, Richard A. O'Keefe wrote:> Consider the question we had recently: "how do I count the lines in a file > without reading it into R?" The solution I suggested was > > as.numeric(system(paste("wc -l <", filename), TRUE)) > > Unfortunately, it doesn't work, or at least, not all the time. > If you already know all about that, and don't care, or already have > a solution, stop reading now. Otherwise, let me try to undo any > harm I may have done by providing a fuller solution. > > We've had several reports in this list about problems caused by Windows > file names with spaces in them. File names with spaces are also common > in MacOS X, so common, in fact, that file name completion in a Terminal > actually works (if you have a file name "Foo Bar", and type F, o, TAB > you get Foo\ Bar). File names with spaces are possible in other Unix > systems too, and always have been, though they are less likely.That's been a feature of Unix shells with file completion (e.g. tcsh) for at least a decade -- credit where credit is due, please.> So suppose there is a file "Foo Bar" you want to find the size of. > > file.name <- "Foo Bar" > > system(paste("wc -l <", File.name) > executes the command > wc -l < Foo Bar > which gives you the size of Bar if there is one, or fails if there is not, > and ignores Foo (should there be one) and of course ignores "Foo Bar". > > What can we do about it? Well, we can try this: > > for.system <- function (s) gsub(" ", "\\\\ ", s) > > system(paste("wc -l <", for.system(file.name)), TRUE) > > Great. Works for files with spaces in their names. Now we try some other > file names. (File names like this are abundant in MacOS X.) > > file.name <- "Black & White Minstrels/1972" > > Whoops. wc -l < Black\ &\ White\ Minstrels/1972 > forks off "wc -l <Black\ " and then tries to run > "\ White\ Minstrels/1972". > > file.name <- "Quake(R)/scores" > > Whoops. "Badly placed ()'s". > > file.name <- "Drunkard's walk/log-1' > > Whoops. "Unmatched '" > > So try again. > > for.system <- > function (s) gsub("([][)(}{'\";&! \t])", "\\\\\\1", s) > > line.count <- > function (s) as.numeric(system(paste("wc -l <", for.system(s)), TRUE)) > > This _still_ isn't perfect, but it is a whole lot better than the naive > version. The major remaining problem is that the set of special characters > and the quoting mechanism need to be changed for Windows. I _think_ the > Windows version should be something like > > for.system <- function (s) { > i <- grep("[^-_:.A-Za-z0-9/\\\\]", s) > s[i] <- sapply(s[i], function (s) paste("\"", s, "\"", sep="")) > s > } > > But what if a file name contains a double quote? Until someone tells me, > I'm just going to hope it doesn't happen. Putting the pieces together, > > f% cat >"Foo Bar" > a b c > d e > f > <EOF> > > > for.system <- > if (.Platform$OS.type == "windows") { > function (s) { > i <- grep("[^-_:.A-Za-z0-9/\\\\]", s) > s[i] <- sapply(s[i], function (s) paste("\"", s, "\"", sep="")) > s > } > } else { > function (s) gsub("([][)(}{'\";&! \t\n])", "\\\\\\1", s) > } > > wc <- function (s) { > r <- scan(pipe(paste("wc <", for.system(s)), open="r"), n=3, quiet=TRUE) > names(r) <- c("lines", "words", "chars") > r > } > > > wc("Foo Bar") > lines words chars > 3 6 12 > > system("cp $HOME/.login Drunkard\\'s\\ Walk") > > wc("Drunkard's Walk")["chars"] > chars > 3633 > > > > If there's already something like for.system() built into R, I'd be very > happy to know about it. (It's a little odd that system() and pipe() > don't already support something like this; in a multi-element character > vector the first could be taken literally and the remaining ones could be > taken quoted with leading spaces.) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, stats.ox.ac.uk/~ripley University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Brian D Ripley <ripley at stats.ox.ac.uk> wrote: The normal way to do this is to quote the string, here filename. See ?shQuote. Your comments really are not fair: My comments were to the effect that *I* had recommended an approach that didn't quite work. There's nothing unfair to R or the R team in that. I tried to do something helpful about it. My next comment was: > If there's already something like for.system() built into R, I'd be very > happy to know about it. I don't see anything unfair about that either. It's a clear suggestion that I expected that there _was_ something like for.system() built into R. What's unfair about hinting that I expect the problem is already solved? I didn't say that R didn't *have* a solution, just that *I* didn't know about it. In retrospect, that WAS unfair. It was unfair to me. Because you see, I did go looking. THERE IS NOTHING ABOUT shQuote in ?system or ?pipe. Not in the text, not in the examples, and not in the See Also section. > (It's a little odd that system() and pipe() > don't already support something like this; This is the one and only comment which could in any way be taken as critical of R or the R team. I stand by my comment that > in a multi-element character > vector the first could be taken literally and the remaining ones could be > taken quoted with leading spaces.) This would be old.system <- system system <- function (command, intern = FALSE, ignore.stderr = FALSE) { if (length(command > 1)) command <- do.call("paste", c(command[1], lapply(com[2:length(command)], shQuote))) old.system(command, intern, ignore.stderr) } or, more directly, system <- function (command, intern = FALSE, ignore.stderr = FALSE) { if (length(command > 1)) command <- do.call("paste", c(command[1], lapply(com[2:length(command)], shQuote))) .Internal(system(if (ignore.stderr) paste(command, "2>/dev/null") else command, intern)) } This would mean that something like system(c("mv", old.name, new.name)) would work _without_ the user having to remember to call shQuote. This leads me to pipe(). In ?pipe we read description: character. A description of the connection. For 'file' and 'pipe' this is a path to the file to be opened. For 'url' it is a complete URL, including schemes ('http://', 'ftp://' or 'file://'). 'file' also accepts complete URLs. This should be description: character. A description of the connection. For 'file' this is a path to the file to be opened. For 'pipe' it is the OS command which is to be run. For 'url' it is a complete URL, including schemes ('http://', 'ftp://' or 'file://'). 'file' also accepts complete URLs. The 'See Also' section of ?pipe should include a paragraph: For 'pipe', see 'system' and 'shQuote'. In the help page for 'system' the paragraph If 'intern' is 'FALSE' then the C function 'system' is used to invoke the command and the value returned by 'system' is the exit status of this function. should be followed by a new paragraph: While your operating system may allow almost any string as a file name, the system command interpreter doesn't. Some characters, such as spaces, quotation marks, and apostrophes, are likely to give you trouble. You should only paste file names into a command directly when you are certain that they do not contain any unusual characters. If they are valid R identifiers, you should have no trouble. In general, you should quote file names using shQuote(), which knows what needs quoting for your system command interpreter and how to do that quoting. Then in See Also: 'shQuote' for quoting file names as and when necessary. Then in Examples: file.name <- "What's On" quoted.name <- shQuote(file.name) system(paste("echo nothing >", quoted.name)) system(paste("ls -l", quoted.name)) system(paste("rm", quoted.name)) (This example has been tested.) Had something like this already been in those help files, I would have found shQuote when I went looking for it.
Apparently Analagous Threads
- [LLVMdev] [bikeshed] Anyone have strong feelings about always putting `template <...>` on its own line?
- Trying to use pipes in R
- scan() vs readChar() speed
- Problem with sas.get function in Hmisc
- unexpected behavior of unzip with list=T and unzip=/usr/bin/unzip