Christoph Scherber
2013-Sep-03 09:51 UTC
[R] Multiple regression (with interactions) by hand
Dear all, I?ve played around with the "airquality" dataset, trying to solve the matrix equations of a simple multiple regression by hand; however, my matrix multiplications don?t lead to the estimates returned by coef(). What have I done wrong here? ## m1=lm(Ozone~Solar.R*Wind,airquality) # remove NA?s: airquality2=airquality[complete.cases(airquality$Ozone)& complete.cases(airquality$Solar.R)& complete.cases(airquality$Wind),] # create the model matrix by hand: X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind) # is the same as: model.matrix(m1) # create the response vector by hand: Y=airquality2$Ozone # is the same as: m1$model$Ozone # Now solve for the parameter estimates: library(MASS) ginv(t(X)%*%X)%*%t(X)%*%Y # is not the same as: coef(m1) ## Now why is my result (line ginv(...)) not the same as the one returned by coef(m1)? Thanks very much for your help! Best regards, Christoph [using R 3.0.1 on Windows 7 32-Bit] -- PD Dr Christoph Scherber Georg-August University Goettingen Department of Crop Science Agroecology Grisebachstrasse 6 D-37077 Goettingen Germany phone 0049 (0)551 39 8807 fax 0049 (0)551 39 8806 http://www.gwdg.de/~cscherb1
Hi Christoph, Use this matrix expression instead: solve(crossprod(X)) %*% t(X) %*% Y Note that: all.equal(crossprod(X), t(X) %*% X) Cheers, Joshua On Tue, Sep 3, 2013 at 2:51 AM, Christoph Scherber <Christoph.Scherber at agr.uni-goettingen.de> wrote:> Dear all, > > I?ve played around with the "airquality" dataset, trying to solve the matrix equations of a simple > multiple regression by hand; however, my matrix multiplications don?t lead to the estimates returned > by coef(). What have I done wrong here? > > ## > m1=lm(Ozone~Solar.R*Wind,airquality) > > # remove NA?s: > airquality2=airquality[complete.cases(airquality$Ozone)& > complete.cases(airquality$Solar.R)& > complete.cases(airquality$Wind),] > > # create the model matrix by hand: > X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind) > # is the same as: > model.matrix(m1) > > # create the response vector by hand: > Y=airquality2$Ozone > # is the same as: > m1$model$Ozone > > # Now solve for the parameter estimates: > library(MASS) > ginv(t(X)%*%X)%*%t(X)%*%Y > > # is not the same as: > coef(m1) > > ## > Now why is my result (line ginv(...)) not the same as the one returned by coef(m1)? > > Thanks very much for your help! > > Best regards, > Christoph > > [using R 3.0.1 on Windows 7 32-Bit] > > > > > > -- > PD Dr Christoph Scherber > Georg-August University Goettingen > Department of Crop Science > Agroecology > Grisebachstrasse 6 > D-37077 Goettingen > Germany > phone 0049 (0)551 39 8807 > fax 0049 (0)551 39 8806 > http://www.gwdg.de/~cscherb1 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com