John Sorkin
2006-Oct-14 20:26 UTC
[R] regression analyses using a vector of means and a variance-covariance matrix
R 2.2.0
windows XP
How can I perform a regression analyses using a vector of means, a
variance-covariance matrix? I looked at the help screen for lm and did
not see any option for using the afore mentioned structures as input to
lm.
Thanks,
John
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC,
University of Maryland School of Medicine Claude D. Pepper OAIC,
University of Maryland Clinical Nutrition Research Unit, and
Baltimore VA Center Stroke of Excellence
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
jsorkin at grecc.umaryland.edu
Confidentiality Statement:
This email message, including any attachments, is for the so...{{dropped}}
John Fox
2006-Oct-14 21:15 UTC
[R] regression analyses using a vector of means anda variance-covariance matrix
Dear John,
I'm not aware of an existing function that will do what you want
conveniently, but it's not hard to write one:
reg <- function(V, means, y, x, n){
# V: covariance matrix of variables
# means: vector of means
# y: index of response in V
# x: indices of regressors in V
SSP <- (n - 1)*V + n*outer(means, means)
SSP <- cbind(n*means, SSP)
SSP <- rbind(c(n, n*means), SSP)
XtXinv <- solve(SSP[c(1,x+1), c(1,x+1)])
b <- as.vector(XtXinv %*% SSP[c(1,x+1), y+1])
R2 <- as.vector(V[y,x] %*% solve(V[x,x]) %*% V[x,y])/V[y,y]
s2 <- (1 - R2)*V[y,y]*(n - 1)/(n - length(x) - 1)
Vb <- s2*XtXinv
list(coef=b, vcov=Vb, R2=R2, s2=s2, SSP=SSP)
}
For example,
> library(car)
> data <- Duncan[,c("prestige", "income",
"education")]
> reg(var(data), colMeans(data), 1, 2:3, nrow(data))
$coef
[1] -6.0646629 0.5987328 0.5458339
$vcov
income education
18.2494814 -0.15184501 -0.150706025
income -0.1518450 0.01432027 -0.008518551
education -0.1507060 -0.00851855 0.009653582
$R2
[1] 0.8281734
$s2
[1] 178.7309
$SSP
prestige income education
45 2146 1884 2365
prestige 2146 146028 118229 147936
income 1884 118229 105148 122197
education 2365 147936 122197 163265
> summary(lm(prestige ~ income + education, data=Duncan)) # check
Call:
lm(formula = prestige ~ income + education, data = Duncan)
Residuals:
Min 1Q Median 3Q Max
-29.5380 -6.4174 0.6546 6.6051 34.6412
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.06466 4.27194 -1.420 0.163
income 0.59873 0.11967 5.003 1.05e-05 ***
education 0.54583 0.09825 5.555 1.73e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
Residual standard error: 13.37 on 42 degrees of freedom
Multiple R-Squared: 0.8282, Adjusted R-squared: 0.82
F-statistic: 101.2 on 2 and 42 DF, p-value: < 2.2e-16
Some caveats: This function will work OK for well conditioned data, since
I'm inverting the matrix of sums of squares and products; one could take a
more sophisticated approach. As well, there's a bit of redundancy in the
computation of R2, but I didn't have time to figure out how to avoid it.
Finally, there's an obvious advantage to computing on the original data --
e.g., one can get residuals.
I hope this helps,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of John Sorkin
> Sent: Saturday, October 14, 2006 3:27 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] regression analyses using a vector of means anda
> variance-covariance matrix
>
> R 2.2.0
> windows XP
>
> How can I perform a regression analyses using a vector of
> means, a variance-covariance matrix? I looked at the help
> screen for lm and did not see any option for using the afore
> mentioned structures as input to lm.
> Thanks,
> John
>
> John Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> Baltimore VA Medical Center GRECC,
> University of Maryland School of Medicine Claude D. Pepper
> OAIC, University of Maryland Clinical Nutrition Research
> Unit, and Baltimore VA Center Stroke of Excellence
>
> University of Maryland School of Medicine Division of
> Gerontology Baltimore VA Medical Center 10 North Greene
> Street GRECC (BT/18/GR) Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to
> faxing) jsorkin at grecc.umaryland.edu
>
> Confidentiality Statement:
> This email message, including any attachments, is for the\...{{dropped}}
Gabor Grothendieck
2006-Oct-14 22:32 UTC
[R] regression analyses using a vector of means and a variance-covariance matrix
Here is another approach using the same data as in John Fox's reply. His is probably superior but this does have the advantage that its very simple. Note that it gives the same coefficients and R squared to several decimal places. We just simulate a data set with the given means and variance covariance matrix> library(car) # for Duncan > library(MASS) # for mvrnorm > set.seed(1) > DF <- mvrnorm(10000, colMeans(data), var(data), empirical = TRUE) > summary(lm(prestige ~ income + education, as.data.frame(DF)))Call: lm(formula = prestige ~ income + education, data = as.data.frame(DF)) Residuals: Min 1Q Median 3Q Max -49.0377 -8.7332 0.2068 8.8342 50.2737 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -6.064663 0.277576 -21.85 <2e-16 *** income 0.598733 0.007756 77.19 <2e-16 *** education 0.545834 0.006368 85.71 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 13.06 on 9997 degrees of freedom Multiple R-Squared: 0.8282, Adjusted R-squared: 0.8281 F-statistic: 2.409e+04 on 2 and 9997 DF, p-value: < 2.2e-16 On 10/14/06, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:> R 2.2.0 > windows XP > > How can I perform a regression analyses using a vector of means, a > variance-covariance matrix? I looked at the help screen for lm and did > not see any option for using the afore mentioned structures as input to > lm. > Thanks, > John > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC, > University of Maryland School of Medicine Claude D. Pepper OAIC, > University of Maryland Clinical Nutrition Research Unit, and > Baltimore VA Center Stroke of Excellence > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > jsorkin at grecc.umaryland.edu > > Confidentiality Statement: > This email message, including any attachments, is for the so...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >