John Sorkin
2006-Nov-05 15:38 UTC
[R] solution to a regression with multiple independent variable
Please forgive a statistics question. I know that a simple bivariate linear regression, y=f(x) or in R parlance lm(y~x) can be solved using the variance-covariance matrix: beta(x)=covariance(x,y)/variance(x). I also know that a linear regression with multiple independent variables, for example y=f(x,z) can also be solved using the variance-covariance matrix, but I don't know how to do this. Can someone help me go from the variance-covariance matrix to the solution of a regression with multiple independent variables? It is not clear how one applies the matrix solution b(x'x)-1*x'y to the elements of the variance-covariance matrix, i.e. how one gets the required values from the variance-covariance matrix. Any help, or suggestions would be appreciated. Thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) jsorkin at grecc.umaryland.edu Confidentiality Statement: This email message, including any attachments, is for the so...{{dropped}}
Charles C. Berry
2006-Nov-05 18:22 UTC
[R] solution to a regression with multiple independent variable
On Sun, 5 Nov 2006, John Sorkin wrote:> Please forgive a statistics question. > I know that a simple bivariate linear regression, y=f(x) or in R > parlance lm(y~x) can be solved using the variance-covariance matrix: > beta(x)=covariance(x,y)/variance(x). I also know that a linear > regression with multiple independent variables, for example y=f(x,z) > can also be solved using the variance-covariance matrix, but I don't > know how to do this. Can someone help me go from the variance-covariance > matrix to the solution of a regression with multiple independent > variables? It is not clear how one applies the matrix solution b> (x'x)-1*x'y to the elements of the variance-covariance matrix, i.e. how > one gets the required values from the variance-covariance matrix. > Any help, or suggestions would be appreciated. >The "x"s you use above have differing meanings - a possible source of confusion. The "x" in "(x'x)-1*x'y" is the design matrix and in the case of a simple linear regression (not "bivariate" BTW) contains a column of ones and a column of values of the independent variable. I suggest you review the chapter in Draper and Smith's Applied Regression Analysis where the transition to the matrix algebraic formulation of regression is laid out. IIRC, it is done first for simple linear regression. In concert with this carry out the computation "longhand" (with the help of R) for the simple linear regression using both formulae. Also do it using a centered version of 'x'. Here is one version:> x <- 1:10 > y <- rnorm(10)+x > cov(x,y)[1] 10.17249> var(x)[1] 9.166667> X <- cbind(1,x) > t(X) %*% Xx 10 55 x 55 385> t(X)%*%y[,1] 57.63155 x 408.52594> cov(x,y)/var(x)[1] 1.109727> lm(y~x)Call: lm(formula = y ~ x) Coefficients: (Intercept) x -0.3403 1.1097> solve( t(X) %*% X ) %*% t(X) %*% y[,1] -0.3403414 x 1.1097265> X2 <- cbind( 1, x- mean(x) ) > t(X2) %*% X2[,1] [,2] [1,] 10 0.0 [2,] 0 82.5> 82.5/9 ### have you seen this before?[1] 9.166667> t(X2) %*% y[,1] [1,] 57.63155 [2,] 91.55244> 91.55244/9 ### or this??[1] 10.17249> solve( t(X2) %*% X2 ) %*% t(X2) %*% y[,1] [1,] 5.763155 [2,] 1.109727> mean(y)[1] 5.763155>Try it again using a centered version of y. Does this help? To really get a handle on this, you need to dig into the matrix algebra a bit. Rao's Linear Statistical Inference and Its Applications does this nicely and shows how matrix operations are carried out on the variance-covariance matrices (sorry I don't have the page refs handy, but IIRC it is in a later chapter pertaining to multivariate analysis). Chuck Comment: "solve( t(X) %*% X ) %*% t(X) %*% y" is NOT the way production code for regression problems would be written. If you want to see how production code should be written look at the Fortran source for "dqrls" in the R source code distribution.> Thanks, > John > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC, > University of Maryland School of Medicine Claude D. Pepper OAIC, > University of Maryland Clinical Nutrition Research Unit, and > Baltimore VA Center Stroke of Excellence > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > jsorkin at grecc.umaryland.edu > > Confidentiality Statement: > This email message, including any attachments, is for the so...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717
Pfaff, Bernhard Dr.
2006-Nov-06 09:53 UTC
[R] solution to a regression with multiple independent variable
Hello John, you can derive these estimators by considering a step-wise approach: 1) Derive the estimators by evaluating a model with demeaned variables, i.e. consider (\tilde{X}'\tilde{x})^-1 \tilde{x}'\tilde{y}, where \tilde{...} refers to the demeaned variables. 2) Obtain the estimate of the intercept, by utilising the "Schwerpunkteigenschaft" of the OLS estimator. HTH, Bernhard>Please forgive a statistics question. >I know that a simple bivariate linear regression, y=f(x) or in R >parlance lm(y~x) can be solved using the variance-covariance matrix: >beta(x)=covariance(x,y)/variance(x). I also know that a linear >regression with multiple independent variables, for example y=f(x,z) >can also be solved using the variance-covariance matrix, but I don't >know how to do this. Can someone help me go from the >variance-covariance >matrix to the solution of a regression with multiple independent >variables? It is not clear how one applies the matrix solution b>(x'x)-1*x'y to the elements of the variance-covariance matrix, i.e. how >one gets the required values from the variance-covariance matrix. >Any help, or suggestions would be appreciated. > >Thanks, >John > >John Sorkin M.D., Ph.D. >Chief, Biostatistics and Informatics >Baltimore VA Medical Center GRECC, >University of Maryland School of Medicine Claude D. Pepper OAIC, >University of Maryland Clinical Nutrition Research Unit, and >Baltimore VA Center Stroke of Excellence > >University of Maryland School of Medicine >Division of Gerontology >Baltimore VA Medical Center >10 North Greene Street >GRECC (BT/18/GR) >Baltimore, MD 21201-1524 > >(Phone) 410-605-7119 >(Fax) 410-605-7913 (Please call phone number above prior to faxing) >jsorkin at grecc.umaryland.edu > >Confidentiality Statement: >This email message, including any attachments, is for the >so...{{dropped}} > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. >***************************************************************** Confidentiality Note: The information contained in this mess...{{dropped}}
Maybe Matching Threads
- FW: How to fit an linear model withou intercept
- regression analyses using a vector of means and a variance-covariance matrix
- Simulation with R
- selecting rows for inclusion in lm
- Recalling and printing multiple graphs. Is there something in the HISTORY menu that will help?