Hello,
Is this a bug or a feature? I am using R 2.7.1 on Apple OS X.
> y <- matrix(1:3,nrow=3) # y is a single-column matrix
> df <-data.frame(x=1:3,y=y)
> sapply(df,data.class)
x y
"numeric" "numeric"
> df$yy <- y
> sapply(df,data.class)
x y yy
"numeric" "numeric" "matrix"
I'm not sure why dataframes are allowed to have matrices as members.
It's also weird to me that y & yy have different classes. It seems like
there has been a blurring of the line between lists and dataframes.
When did dataframes start taking members other than vectors?
This is an issue if one for example builds a dataframe to fit a model,
and then subsequently wants to use predict. You have to work a bit to
avoid a type mismatch error.
> df$out = df$x+df$y+df$yy + rnorm(3)
> df
x y yy out
1 1 1 1 3.066348
2 2 2 2 5.516017
3 3 3 3 11.073452
> glmout = glm(out~x+y+yy,data=df)
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3))
Error: variable 'yy' was fitted with type "nmatrix.1" but type
"numeric"
was supplied
>
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3)))
Error: variable 'yy' was fitted with type "nmatrix.1" but type
"numeric"
was supplied
> predict(glmout,newdata=df[,-4])
1 2 3
2.548387 6.551939 10.555491
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :
prediction from a rank-deficient fit may be misleading
I'm not really looking for a "solution", as I can already identify
several workarounds. I guess I'm mainly trying to figure out what the
philosophy is here.
This is also weird to me:
> df$yy <- as.data.frame(y)
> df
x y V1 out
1 1 1 1 3.066348
2 2 2 2 5.516017
3 3 3 3 11.073452
> glmout = glm(out~x+y+V1,data=df)
Error in eval(expr, envir, enclos) : object "V1" not found
> glmout = glm(out~x+y+yy,data=df)
Error in model.frame.default(formula = out ~ x + y + yy, data = df,
drop.unused.levels = TRUE) :
invalid type (list) for variable 'yy'
> glmout = glm(out~x+y+yy$VI,data=df)
Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, :
invalid type (NULL) for variable 'yy$VI'
Is it impossible to build a model from a dataframe built this way?
thanks, Daryl Morris
(Biostatistics, Univ. of Washington)
Bill.Venables at csiro.au
2008-Aug-13 05:03 UTC
[R] issue building dataframes with matrices.
It's a feature and it's been there forever. (It's even present in
another system not unlike R.)
Suppose you set
y <- matrix(1:3)
and construct
dfr <- data.frame(x=1:3, y)
Then you invoke the constructor function, data.frame, which by default
simplifies things like matrices to single columns, naming them as
necessary.
Now if you directly modify dfr by adding another component, like
dfr$yy <- y
You bypass the constructor function and its default simplifications, but
you do not bypass the structure tests. This is, in fact the simplest
way to put a matrix inside a data frame intact, but it must have the
same number of rows as has the data frame itself.
There are other ways of getting a matrix into a data frame intact, and
sometimes it is mildly useful to do this. Consider, for example, the
following:
dfr <- within(data.frame(x = 1:5), {
y <- rbinom(5, 100, plogis((x-3)/2))
SF <- cbind(S = y, F = 100-y)
rm(y)
})
names(dfr) ### Note the apparent discrepancy
dfr ### with the printed version.
(fm <- glm(SF ~ x, binomial, dfr))
Bill Venables
http://www.cmis.csiro.au/bill.venables/
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Daryl Morris
Sent: Wednesday, 13 August 2008 11:31 AM
To: r-help at r-project.org
Subject: [R] issue building dataframes with matrices.
Hello,
Is this a bug or a feature? I am using R 2.7.1 on Apple OS X.
> y <- matrix(1:3,nrow=3) # y is a single-column matrix
> df <-data.frame(x=1:3,y=y)
> sapply(df,data.class)
x y
"numeric" "numeric"
> df$yy <- y
> sapply(df,data.class)
x y yy
"numeric" "numeric" "matrix"
I'm not sure why dataframes are allowed to have matrices as members.
It's also weird to me that y & yy have different classes. It seems like
there has been a blurring of the line between lists and dataframes.
When did dataframes start taking members other than vectors?
This is an issue if one for example builds a dataframe to fit a model,
and then subsequently wants to use predict. You have to work a bit to
avoid a type mismatch error.
> df$out = df$x+df$y+df$yy + rnorm(3)
> df
x y yy out
1 1 1 1 3.066348
2 2 2 2 5.516017
3 3 3 3 11.073452
> glmout = glm(out~x+y+yy,data=df)
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3))
Error: variable 'yy' was fitted with type "nmatrix.1" but type
"numeric"
was supplied
>
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3)))
Error: variable 'yy' was fitted with type "nmatrix.1" but type
"numeric"
was supplied
> predict(glmout,newdata=df[,-4])
1 2 3
2.548387 6.551939 10.555491
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type =:
prediction from a rank-deficient fit may be misleading
I'm not really looking for a "solution", as I can already identify
several workarounds. I guess I'm mainly trying to figure out what the
philosophy is here.
This is also weird to me:
> df$yy <- as.data.frame(y)
> df
x y V1 out
1 1 1 1 3.066348
2 2 2 2 5.516017
3 3 3 3 11.073452
> glmout = glm(out~x+y+V1,data=df)
Error in eval(expr, envir, enclos) : object "V1" not found
> glmout = glm(out~x+y+yy,data=df)
Error in model.frame.default(formula = out ~ x + y + yy, data = df,
drop.unused.levels = TRUE) :
invalid type (list) for variable 'yy'
> glmout = glm(out~x+y+yy$VI,data=df)
Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df,
:
invalid type (NULL) for variable 'yy$VI'
Is it impossible to build a model from a dataframe built this way?
thanks, Daryl Morris
(Biostatistics, Univ. of Washington)
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Daryl Morris
2009-Apr-24 05:42 UTC
[R] issue building my own package... moving from Apple OS to Windows
Hello, I have written my own very simple package. On an Apple, I was able to run through the "R CMD build" and "R CMD check" successfully. I have also installed the package, and successfully loaded the library on my Apple. This package is written entirely in R and requires no compilation. I am trying to move the package to a Windows machine. I (perhaps naively) thought, given that it contains no code requiring compilation, that I should just be able to take the .tar.gz file and directly install it in Windows. This didn't work. Nor did "translating' the contents of the .tar.gz file into a .zip file. I was about to provide the errors, but more googling on this issue suggests that maybe what I'm trying to do is impossible. Do I really have to "build" on a Windows box ... even when the package requires no compilation? Is there no simple translation tool available for this case? thank you, Daryl U. Washington Biostatistics
Hello,
I'm trying to fit a mixed-effects model with a single binary predictor
(case/control status in my case), a random intercept (e.g. dependent on
radiologist) and also a random slope (a per-radiologist difference
between cases and controls).
I know how to do that, but what I don't know how to do is both of (1)
allowing the variance to be different for cases and controls (2) forcing
the random effects to be independent
By "both", I mean:
(1) Using lme (from nlme library) I know how to use varGroup as
described in Pinheiro & Bates chapter 5, but in that library, I don't
know how to force the random effects to be independent.
(2) Using lmer (from lme4 library) I can force the random effects to be
independent (using a description published by Bates in the R magazine in
2005) but I don't know how to allow the variance to depend on group.
To be clear, the model I wan to fit is:
Y_{ij} ~ beta_0 + beta_1*disease_{ij} + b_i 0 + b_i1*disease_{ij} +
error_{ij}
where b_i0 and b_i1 are independent Normal
where error_{ij} = Normal(0, sd_case) if disease_{ij}= 1
error_{ij} = Normal(0, sd_control) if disease_{ij}= 2
i is an indicator of radiologist... a single radiologist does multiple
cases and multiple controls.
Thanks, Daryl