Dear all, while looking at some R-code submitted by students in a unit that I teach, I came across constructs that I thought would lead to an error. Much to my surprise, the code is actually executed. A boiled down version of the code is the following:> tt <- function(x, i){+ mean(x[i,2])/mean(x[i,1]) + }> dat <- matrix(rnorm(200), ncol=2) > mean(dat[,2])/mean(dat[,1])[1] -1.163893> dat1 <- data.frame(dat) > tt(dat1) ### Why does this work?[1] -1.163893> tt(dat)Error in mean(x[i, 2]) : argument "i" is missing, with no default Since the data for the assignment was in a data frame, the students got an answer and not an error message when they called the equivalent of tt(dat1) in their work. I tested this code on R 1.8.1, 1.9.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.2.0 and R-devel (2005-11-14 r36330), all with the same result, no error message when executing tt(dat1). I would have expected that tt(dat1) behaves in the same way as tt(dat) and would produce an error. Thus, I think it is a bug, but the fact that so many R versions accept this code makes me wonder whether it is a misunderstanding on my side. Can somebody enlighten me why this code is working? Cheers, Berwin
On 11/15/05, Berwin A Turlach <berwin at maths.uwa.edu.au> wrote:> Dear all, > > while looking at some R-code submitted by students in a unit that I > teach, I came across constructs that I thought would lead to an error. > Much to my surprise, the code is actually executed. > > A boiled down version of the code is the following: > > > tt <- function(x, i){ > + mean(x[i,2])/mean(x[i,1]) > + } > > dat <- matrix(rnorm(200), ncol=2) > > mean(dat[,2])/mean(dat[,1]) > [1] -1.163893 > > dat1 <- data.frame(dat) > > tt(dat1) ### Why does this work? > [1] -1.163893 > > tt(dat) > Error in mean(x[i, 2]) : argument "i" is missing, with no default > > Since the data for the assignment was in a data frame, the students got > an answer and not an error message when they called the equivalent of > tt(dat1) in their work. > > I tested this code on R 1.8.1, 1.9.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, > 2.2.0 and R-devel (2005-11-14 r36330), all with the same result, no > error message when executing tt(dat1). > > I would have expected that tt(dat1) behaves in the same way as tt(dat) > and would produce an error. Thus, I think it is a bug, but the fact > that so many R versions accept this code makes me wonder whether it is > a misunderstanding on my side. Can somebody enlighten me why this > code is working? >I don't have a complete explanation but consider: f <- function(x) missing(x) g <- function(x) f(x) g() # TRUE That is, in R one can pass missing values from one function to another and that is evidently what is happening with tt which passes the missing i to [.data.frame. The weird part, to me, is that [ does not also allow this even though it does allow empty arguments though likely its due to [ being written in C and [.data.frame being written in R. Try getAnywhere("[.data.frame") getAnywhere("[")
On Wed, 16 Nov 2005, Berwin A Turlach wrote:> Dear all, > > while looking at some R-code submitted by students in a unit that I > teach, I came across constructs that I thought would lead to an error. > Much to my surprise, the code is actually executed. > > A boiled down version of the code is the following: > >> tt <- function(x, i){ > + mean(x[i,2])/mean(x[i,1]) > + } >> dat <- matrix(rnorm(200), ncol=2) >> mean(dat[,2])/mean(dat[,1]) > [1] -1.163893 >> dat1 <- data.frame(dat) >> tt(dat1) ### Why does this work? > [1] -1.163893 >> tt(dat) > Error in mean(x[i, 2]) : argument "i" is missing, with no default > > Since the data for the assignment was in a data frame, the students got > an answer and not an error message when they called the equivalent of > tt(dat1) in their work. > > I tested this code on R 1.8.1, 1.9.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, > 2.2.0 and R-devel (2005-11-14 r36330), all with the same result, no > error message when executing tt(dat1). > > I would have expected that tt(dat1) behaves in the same way as tt(dat) > and would produce an error. Thus, I think it is a bug, but the fact > that so many R versions accept this code makes me wonder whether it is > a misunderstanding on my side. Can somebody enlighten me why this > code is working?[.data.frame is interpreted, [ is internal for a matrix. The issue is what happens to x[i,2] where i is missing. In [.data.frame you find if(missing(i)) { # df[, j] or df[ , ] ## handle the column only subsetting ... if(!missing(j)) x <- x[j] cols <- names(x) if(any(is.na(cols))) stop("undefined columns selected") } ... so it was deliberate, it seems. I believe S used to do the same thing in its S3 days, but it appears this is now an error. However, missingness is an area of S/R differences (mainly undocumented, I think). Currently in S there is> args("[.data.frame")function(x, ..., drop = T) whereas in R> args("[.data.frame")function (x, i, j, drop = .... Since [ is primitive you cannot use args() on it, but its argument list is more like S's (which is f(x, ..., drop = T) for the generic and all methods). I don't believe this would be easy to change if we wanted to. Similar things happen for the replacement method. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595