Hello, Is this a bug or a feature? I am using R 2.7.1 on Apple OS X. > y <- matrix(1:3,nrow=3) # y is a single-column matrix > df <-data.frame(x=1:3,y=y) > sapply(df,data.class) x y "numeric" "numeric" > df$yy <- y > sapply(df,data.class) x y yy "numeric" "numeric" "matrix" I'm not sure why dataframes are allowed to have matrices as members. It's also weird to me that y & yy have different classes. It seems like there has been a blurring of the line between lists and dataframes. When did dataframes start taking members other than vectors? This is an issue if one for example builds a dataframe to fit a model, and then subsequently wants to use predict. You have to work a bit to avoid a type mismatch error. > df$out = df$x+df$y+df$yy + rnorm(3) > df x y yy out 1 1 1 1 3.066348 2 2 2 2 5.516017 3 3 3 3 11.073452 > glmout = glm(out~x+y+yy,data=df) > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3)) Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" was supplied > > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3))) Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" was supplied > predict(glmout,newdata=df[,-4]) 1 2 3 2.548387 6.551939 10.555491 Warning message: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading I'm not really looking for a "solution", as I can already identify several workarounds. I guess I'm mainly trying to figure out what the philosophy is here. This is also weird to me: > df$yy <- as.data.frame(y) > df x y V1 out 1 1 1 1 3.066348 2 2 2 2 5.516017 3 3 3 3 11.073452 > glmout = glm(out~x+y+V1,data=df) Error in eval(expr, envir, enclos) : object "V1" not found > glmout = glm(out~x+y+yy,data=df) Error in model.frame.default(formula = out ~ x + y + yy, data = df, drop.unused.levels = TRUE) : invalid type (list) for variable 'yy' > glmout = glm(out~x+y+yy$VI,data=df) Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, : invalid type (NULL) for variable 'yy$VI' Is it impossible to build a model from a dataframe built this way? thanks, Daryl Morris (Biostatistics, Univ. of Washington)
Bill.Venables at csiro.au
2008-Aug-13 05:03 UTC
[R] issue building dataframes with matrices.
It's a feature and it's been there forever. (It's even present in another system not unlike R.) Suppose you set y <- matrix(1:3) and construct dfr <- data.frame(x=1:3, y) Then you invoke the constructor function, data.frame, which by default simplifies things like matrices to single columns, naming them as necessary. Now if you directly modify dfr by adding another component, like dfr$yy <- y You bypass the constructor function and its default simplifications, but you do not bypass the structure tests. This is, in fact the simplest way to put a matrix inside a data frame intact, but it must have the same number of rows as has the data frame itself. There are other ways of getting a matrix into a data frame intact, and sometimes it is mildly useful to do this. Consider, for example, the following: dfr <- within(data.frame(x = 1:5), { y <- rbinom(5, 100, plogis((x-3)/2)) SF <- cbind(S = y, F = 100-y) rm(y) }) names(dfr) ### Note the apparent discrepancy dfr ### with the printed version. (fm <- glm(SF ~ x, binomial, dfr)) Bill Venables http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daryl Morris Sent: Wednesday, 13 August 2008 11:31 AM To: r-help at r-project.org Subject: [R] issue building dataframes with matrices. Hello, Is this a bug or a feature? I am using R 2.7.1 on Apple OS X. > y <- matrix(1:3,nrow=3) # y is a single-column matrix > df <-data.frame(x=1:3,y=y) > sapply(df,data.class) x y "numeric" "numeric" > df$yy <- y > sapply(df,data.class) x y yy "numeric" "numeric" "matrix" I'm not sure why dataframes are allowed to have matrices as members. It's also weird to me that y & yy have different classes. It seems like there has been a blurring of the line between lists and dataframes. When did dataframes start taking members other than vectors? This is an issue if one for example builds a dataframe to fit a model, and then subsequently wants to use predict. You have to work a bit to avoid a type mismatch error. > df$out = df$x+df$y+df$yy + rnorm(3) > df x y yy out 1 1 1 1 3.066348 2 2 2 2 5.516017 3 3 3 3 11.073452 > glmout = glm(out~x+y+yy,data=df) > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3)) Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" was supplied > > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3))) Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" was supplied > predict(glmout,newdata=df[,-4]) 1 2 3 2.548387 6.551939 10.555491 Warning message: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type =: prediction from a rank-deficient fit may be misleading I'm not really looking for a "solution", as I can already identify several workarounds. I guess I'm mainly trying to figure out what the philosophy is here. This is also weird to me: > df$yy <- as.data.frame(y) > df x y V1 out 1 1 1 1 3.066348 2 2 2 2 5.516017 3 3 3 3 11.073452 > glmout = glm(out~x+y+V1,data=df) Error in eval(expr, envir, enclos) : object "V1" not found > glmout = glm(out~x+y+yy,data=df) Error in model.frame.default(formula = out ~ x + y + yy, data = df, drop.unused.levels = TRUE) : invalid type (list) for variable 'yy' > glmout = glm(out~x+y+yy$VI,data=df) Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, : invalid type (NULL) for variable 'yy$VI' Is it impossible to build a model from a dataframe built this way? thanks, Daryl Morris (Biostatistics, Univ. of Washington) ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Daryl Morris
2009-Apr-24 05:42 UTC
[R] issue building my own package... moving from Apple OS to Windows
Hello, I have written my own very simple package. On an Apple, I was able to run through the "R CMD build" and "R CMD check" successfully. I have also installed the package, and successfully loaded the library on my Apple. This package is written entirely in R and requires no compilation. I am trying to move the package to a Windows machine. I (perhaps naively) thought, given that it contains no code requiring compilation, that I should just be able to take the .tar.gz file and directly install it in Windows. This didn't work. Nor did "translating' the contents of the .tar.gz file into a .zip file. I was about to provide the errors, but more googling on this issue suggests that maybe what I'm trying to do is impossible. Do I really have to "build" on a Windows box ... even when the package requires no compilation? Is there no simple translation tool available for this case? thank you, Daryl U. Washington Biostatistics
Hello, I'm trying to fit a mixed-effects model with a single binary predictor (case/control status in my case), a random intercept (e.g. dependent on radiologist) and also a random slope (a per-radiologist difference between cases and controls). I know how to do that, but what I don't know how to do is both of (1) allowing the variance to be different for cases and controls (2) forcing the random effects to be independent By "both", I mean: (1) Using lme (from nlme library) I know how to use varGroup as described in Pinheiro & Bates chapter 5, but in that library, I don't know how to force the random effects to be independent. (2) Using lmer (from lme4 library) I can force the random effects to be independent (using a description published by Bates in the R magazine in 2005) but I don't know how to allow the variance to depend on group. To be clear, the model I wan to fit is: Y_{ij} ~ beta_0 + beta_1*disease_{ij} + b_i 0 + b_i1*disease_{ij} + error_{ij} where b_i0 and b_i1 are independent Normal where error_{ij} = Normal(0, sd_case) if disease_{ij}= 1 error_{ij} = Normal(0, sd_control) if disease_{ij}= 2 i is an indicator of radiologist... a single radiologist does multiple cases and multiple controls. Thanks, Daryl