Hi all, Could you please help me? I am trying to understand why this line works: lm1x = lm(y~X-1, tmp) Here it seems that I was combining the design matrix and the data frame... And X below is not a single column, in fact, it's a bunch of columns in matrix form... I don't understand why this line works... Is it just luck, i.e. if we change the data-set and/or formulas to something else, this will potentially fail? (that's something I would like to catch and avoid...) Thank you! ------------------------------- The data is located at: http://www.ling.uni-potsdam.de/~vasishth/book.html In the section: downloadable errata, code, datasets •ERRATA •Corrected Pages •VasishthBroebook.R •beauty.txt •mathachieve.txt •mathachschool.txt ------------------------------ MathAchieve <- read.table("mathachieve.txt") colnames(MathAchieve) <- c("School", "Minority", "Sex", "SES", "MathAch", "MEANSES") head(MathAchieve) MathAchSchool <- read.table("mathachschool.txt") colnames(MathAchSchool) <- c("School", "Size", "Sector", "PRACAD", "DISCLIM", "HIMINTY", "MEANSES") MathScores <- merge(MathAchieve, MathAchSchool, by = "School") lm1 = lm(MathAch ~ SES + factor(Sector) , MathScores) X=model.matrix(MathAch ~ SES+factor(Sector) , MathScores) y=MathScores$MathAch tmp=MathScores tmp$y=y tmp$X=X lm1x = lm(y~X-1, tmp) plot(fitted(lm1), fitted(lm1x)) max(abs(fitted(lm1) - fitted(lm1x))) [[alternative HTML version deleted]]
peter dalgaard
2012-May-12 08:31 UTC
[R] Why can we combine design matrix and data-frame in R?
On May 12, 2012, at 04:40 , Michael wrote:> Hi all, > > Could you please help me? > > I am trying to understand why this line works: > > lm1x = lm(y~X-1, tmp) > > Here it seems that I was combining the design matrix and the data frame... > > And X below is not a single column, in fact, it's a bunch of columns in > matrix form... > > I don't understand why this line works...It works because you can have a matrix on the right hand side of a model formula and it will be interpreted as a set of columns... If X is constructed via model.matrix it usually contains an all-1 column, which is why you need to remove the intercept with "-1". A data frame can contain matrices, you just need to be careful that some functions, notably data.frame(), which will split a matrix into its constituent columns, unless protected with I(). Notice the difference between these two examples:> d <- data.frame(a=1:3,m=matrix(4:9,,2)) > d$mNULL> names(d)[1] "a" "m.1" "m.2"> d <- data.frame(a=1:3,m=I(matrix(4:9,,2))) > d$m[,1] [,2] [1,] 4 7 [2,] 5 8 [3,] 6 9> names(d)[1] "a" "m" (d <- data.frame(a=1:3) ; d$m <- matrix(4:9,,2) is like the 2nd version) Your example has the weakness that you do "tmp$X <- X" so that you have X both in the data frame tmp and in the global workspace, and it is really not clear that the one in the data frame is the one used. I am pretty sure that this is in fact the case, but for your peace of mind, you should try renaming one of them.> > Is it just luck, i.e. if we change the data-set and/or formulas to > something else, this will potentially fail? > (that's something I would like to catch and avoid...) > > Thank you! > > ------------------------------- > > The data is located at: > http://www.ling.uni-potsdam.de/~vasishth/book.html > In the section: > downloadable errata, code, datasets > ?ERRATA > ?Corrected Pages > ?VasishthBroebook.R > ?beauty.txt > ?mathachieve.txt > ?mathachschool.txt > > ------------------------------ > > MathAchieve <- read.table("mathachieve.txt") > > colnames(MathAchieve) <- c("School", "Minority", "Sex", "SES", "MathAch", > "MEANSES") > > head(MathAchieve) > > > MathAchSchool <- read.table("mathachschool.txt") > colnames(MathAchSchool) <- c("School", "Size", "Sector", "PRACAD", > "DISCLIM", "HIMINTY", "MEANSES") > MathScores <- merge(MathAchieve, MathAchSchool, by = "School") > > > lm1 = lm(MathAch ~ SES + factor(Sector) , MathScores) > > X=model.matrix(MathAch ~ SES+factor(Sector) , MathScores) > y=MathScores$MathAch > > tmp=MathScores > tmp$y=y > tmp$X=X > > lm1x = lm(y~X-1, tmp) > > plot(fitted(lm1), fitted(lm1x)) > > max(abs(fitted(lm1) - fitted(lm1x))) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>I am trying to understand why this line works: > > lm1x = lm(y~X-1, tmp)Well, I would not normally define a data frame element as a matrix myself (though I might well define a list element as one). But specifying a matrix as the terms part of an lm is documented in lm's details: "If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix" So _something_ will happen. Whether the something is useful depends on the intent.> Here it seems that I was combining the design matrix and the data frame...Did you inspect tmp after adding the design matrix? Was it an odd looking data frame or a list? What seems to have been done is that the design matrix has been added to a list. I wouldn't normally do that if tmp is a data frame, and r would not do so unless the lengths all matched. But a list should be ok. And lm takes a list or environment as its data argument, so a list of things will work even if they are different types. In other words tmp is just a ragbag of things, each of which lm understands. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}