I may be missing something obvious here, but consider the following simple
dataset simulating repeated measures on 5 individuals with pretty strong
between-individual variance.
set.seed(1003)
n<-5
v<-rep(1:n,each=2)
d<-data.frame(factor(v),v+rnorm(2*n))
names(d)<-c("id","y")
Now consider the following two linear models that provide identical fitted
values, residuals, and estimated residual variance:
m1<-lm(y~id,data=d)
m2<-lm(y~id-1,data=d)
print(max(abs(fitted(m1)-fitted(m2))))
The r-squared reported by summary(m1) appears to be correct in that it is
equal to the squared correlation between the fitted and observed values:
print(summary(m1)$r.squared - cor(fitted(m1),d$y)^2)
However, the same is not true of m2.
print(summary(m2)$r.squared - cor(fitted(m2),d$y)^2)
> R.version
_
platform i686-pc-linux-gnu
arch i686
os linux-gnu
system i686, linux-gnu
status
major 1
minor 9.0
year 2004
month 04
day 12
language R
J.R. Lockwood
412-683-2300 x4941
lockwood at rand.org
http://www.rand.org/methodology/stat/members/lockwood/
--------------------
This email message is for the sole use of the intended recipient(s) and
may contain privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.
J.R. Lockwood wrote:> I may be missing something obvious here, but consider the following simple > dataset simulating repeated measures on 5 individuals with pretty strong > between-individual variance. > > set.seed(1003) > n<-5 > v<-rep(1:n,each=2) > d<-data.frame(factor(v),v+rnorm(2*n)) > names(d)<-c("id","y") > > Now consider the following two linear models that provide identical fitted > values, residuals, and estimated residual variance: > > m1<-lm(y~id,data=d) > m2<-lm(y~id-1,data=d) > print(max(abs(fitted(m1)-fitted(m2)))) > > The r-squared reported by summary(m1) appears to be correct in that it is > equal to the squared correlation between the fitted and observed values: > > print(summary(m1)$r.squared - cor(fitted(m1),d$y)^2) > > However, the same is not true of m2. > > print(summary(m2)$r.squared - cor(fitted(m2),d$y)^2) > > >>R.version > > _ > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status > major 1 > minor 9.0 > year 2004 > month 04 > day 12 > language RI think what you're trying to do is better accomplished by looking at the anova table of the two results a1 <- anova(m1) a2 <- anova(m2) r2.1 <- a1[1, 2]/sum(a1[, 2]) r2.2 <- a2[1, 2]/sum(a2[, 2]) summary(m1)$r.squared - r2.1 summary(m2)$r.squared - r2.2 The result you used above using "cor" still adjusts your data for the grand mean, which m2 doesn't fit. HTH, --sundar