Michael Friendly
2013-Jan-29 13:44 UTC
[R] how to suppress the intercept in an lm()-like formula method?
I'm trying to write a formula method for canonical correlation analysis, that could be called similarly to lm() for a multivariate response: cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...) or perhaps more naturally, cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...) I've adapted the code from lm() to my case, but in this situation, it doesn't make sense to include an intercept, since X & Y are mean centered by default in the computation. In the code below, I can't see where the intercept gets included in the model matrix and therefore how to suppress it. There is a test case at the end, showing that the method fails when called normally, but works if I explicitly use -1 in the formula. I could hack the result of model.matrix(), but maybe there's an easier way? cancor <- function(x, ...) { UseMethod("cancor", x) } cancor.default <- candisc:::cancor # TODO: make cancisc::cancor() use x, y, not X, Y cancor.formula <- function(formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, contrasts = NULL, ...) { cl <- match.call() mf <- match.call(expand.dots = FALSE) m <- match(c("formula", "data", "subset", "weights", "na.action"), names(mf), 0L) mf <- mf[c(1L, m)] mf[[1L]] <- as.name("model.frame") mf <- eval(mf, parent.frame()) mt <- attr(mf, "terms") y <- model.response(mf, "numeric") w <- as.vector(model.weights(mf)) if (!is.null(w) && !is.numeric(w)) stop("'weights' must be a numeric vector") x <- model.matrix(mt, mf, contrasts) # fixup to remove intercept??? z <- if (is.null(w)) cancor.default(x, y, ...) else stop("weights are not yet implemented") # lm.wfit(x, y, w, ...) z$call <- cl z$terms <- mt z } TESTME <- FALSE if (TESTME) { # need to get latest version, 0.6-1 from R-Forge install.packages("candisc", repo="http://R-Forge.R-project.org") library(candisc) data(Rohwer) # this bombs: needs intercept removed cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer) ## Error in chol.default(Rxx) : ## the leading minor of order 1 is not positive definite #this works as is cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ -1 + n + s + ns + na + ss, data=Rohwer) cc ## Canonical correlation analysis of: ## 5 X variables: n, s, ns, na, ss ## with 3 Y variables: SAT, PPVT, Raven ## ## CanR CanRSQ Eigen percent cum ## 1 0.6703 0.44934 0.81599 77.30 77.30 ## 2 0.3837 0.14719 0.17260 16.35 93.65 ## 3 0.2506 0.06282 0.06704 6.35 100.00 ## ## Test of H0: The canonical correlations in the ## current row and all that follow are zero ## ... } -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
Michael Friendly
2013-Jan-29 14:14 UTC
[R] how to suppress the intercept in an lm()-like formula method?
To partly answer my own question: It wasn't that hard to hack the result of model.matrix() to remove the intercept, remove.intercept <- function(x) { if (colnames(x)[1] == "(Intercept)") { x <- x[,-1] attr(x, "assign") <- attr(x, "assign")[-1] } x } However, the model frame and therefore the model terms stored in the object are wrong, still including the intercept: Browse[1]> str(mt) Classes 'terms', 'formula' length 3 cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss ..- attr(*, "variables")= language list(cbind(SAT, PPVT, Raven), n, s, ns, na, ss) ..- attr(*, "factors")= int [1:6, 1:5] 0 1 0 0 0 0 0 0 1 0 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s" "ns" ... .. .. ..$ : chr [1:5] "n" "s" "ns" "na" ... ..- attr(*, "term.labels")= chr [1:5] "n" "s" "ns" "na" ... ..- attr(*, "order")= int [1:5] 1 1 1 1 1 ..- attr(*, "intercept")= int 1 ..- attr(*, "response")= int 1 ..- attr(*, ".Environment")=<environment: R_GlobalEnv> ..- attr(*, "predvars")= language list(cbind(SAT, PPVT, Raven), n, s, ns, na, ss) ..- attr(*, "dataClasses")= Named chr [1:6] "nmatrix.3" "numeric" "numeric" "numeric" ... .. ..- attr(*, "names")= chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s" "ns" ... Browse[1]> On 1/29/2013 8:44 AM, Michael Friendly wrote:> I'm trying to write a formula method for canonical correlation analysis, > that could be called similarly to lm() for > a multivariate response: > > cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...) > or perhaps more naturally, > cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...) > > I've adapted the code from lm() to my case, but in this situation, it > doesn't make sense to > include an intercept, since X & Y are mean centered by default in the > computation. > > In the code below, I can't see where the intercept gets included in the > model matrix and therefore > how to suppress it. There is a test case at the end, showing that the > method fails when called > normally, but works if I explicitly use -1 in the formula. I could hack > the result of model.matrix(), > but maybe there's an easier way? > > cancor <- function(x, ...) { > UseMethod("cancor", x) > } > > cancor.default <- candisc:::cancor > > # TODO: make cancisc::cancor() use x, y, not X, Y > cancor.formula <- function(formula, data, subset, weights, > na.action, > method = "qr", > model = TRUE, > x = FALSE, y = FALSE, qr = TRUE, > contrasts = NULL, ...) { > > cl <- match.call() > mf <- match.call(expand.dots = FALSE) > m <- match(c("formula", "data", "subset", "weights", "na.action"), > names(mf), 0L) > mf <- mf[c(1L, m)] > > mf[[1L]] <- as.name("model.frame") > mf <- eval(mf, parent.frame()) > > mt <- attr(mf, "terms") > y <- model.response(mf, "numeric") > w <- as.vector(model.weights(mf)) > if (!is.null(w) && !is.numeric(w)) > stop("'weights' must be a numeric vector") > > x <- model.matrix(mt, mf, contrasts) > # fixup to remove intercept??? > z <- if (is.null(w)) > cancor.default(x, y, ...) > else stop("weights are not yet implemented") # lm.wfit(x, y, w, ...) > > z$call <- cl > z$terms <- mt > z > } > > TESTME <- FALSE > if (TESTME) { > > # need to get latest version, 0.6-1 from R-Forge > install.packages("candisc", repo="http://R-Forge.R-project.org") > library(candisc) > > data(Rohwer) > > # this bombs: needs intercept removed > cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, > data=Rohwer) > ## Error in chol.default(Rxx) : > ## the leading minor of order 1 is not positive definite > > #this works as is > cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ -1 + n + s + ns + na + > ss, data=Rohwer) > cc > ## Canonical correlation analysis of: > ## 5 X variables: n, s, ns, na, ss > ## with 3 Y variables: SAT, PPVT, Raven > ## > ## CanR CanRSQ Eigen percent cum > ## 1 0.6703 0.44934 0.81599 77.30 77.30 > ## 2 0.3837 0.14719 0.17260 16.35 93.65 > ## 3 0.2506 0.06282 0.06704 6.35 100.00 > ## > ## Test of H0: The canonical correlations in the > ## current row and all that follow are zero > ## > ... > } > >-- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
John Fox
2013-Jan-29 14:23 UTC
[R] how to suppress the intercept in an lm()-like formula method?
Hi Michael, How about, x <- x[, colnames(x) != "(Intercept)"] I hope this helps, John> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Michael Friendly > Sent: Tuesday, January 29, 2013 8:45 AM > To: R-help > Subject: [R] how to suppress the intercept in an lm()-like formula > method? > > I'm trying to write a formula method for canonical correlation analysis, > that could be called similarly to lm() for > a multivariate response: > > cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...) > or perhaps more naturally, > cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...) > > I've adapted the code from lm() to my case, but in this situation, it > doesn't make sense to > include an intercept, since X & Y are mean centered by default in the > computation. > > In the code below, I can't see where the intercept gets included in the > model matrix and therefore > how to suppress it. There is a test case at the end, showing that the > method fails when called > normally, but works if I explicitly use -1 in the formula. I could hack > the result of model.matrix(), > but maybe there's an easier way? > > cancor <- function(x, ...) { > UseMethod("cancor", x) > } > > cancor.default <- candisc:::cancor > > # TODO: make cancisc::cancor() use x, y, not X, Y > cancor.formula <- function(formula, data, subset, weights, > na.action, > method = "qr", > model = TRUE, > x = FALSE, y = FALSE, qr = TRUE, > contrasts = NULL, ...) { > > cl <- match.call() > mf <- match.call(expand.dots = FALSE) > m <- match(c("formula", "data", "subset", "weights", "na.action"), > names(mf), 0L) > mf <- mf[c(1L, m)] > > mf[[1L]] <- as.name("model.frame") > mf <- eval(mf, parent.frame()) > > mt <- attr(mf, "terms") > y <- model.response(mf, "numeric") > w <- as.vector(model.weights(mf)) > if (!is.null(w) && !is.numeric(w)) > stop("'weights' must be a numeric vector") > > x <- model.matrix(mt, mf, contrasts) > # fixup to remove intercept??? > z <- if (is.null(w)) > cancor.default(x, y, ...) > else stop("weights are not yet implemented") # lm.wfit(x, y, w, > ...) > > z$call <- cl > z$terms <- mt > z > } > > TESTME <- FALSE > if (TESTME) { > > # need to get latest version, 0.6-1 from R-Forge > install.packages("candisc", repo="http://R-Forge.R-project.org") > library(candisc) > > data(Rohwer) > > # this bombs: needs intercept removed > cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, > data=Rohwer) > ## Error in chol.default(Rxx) : > ## the leading minor of order 1 is not positive definite > > #this works as is > cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ -1 + n + s + ns + na + > ss, data=Rohwer) > cc > ## Canonical correlation analysis of: > ## 5 X variables: n, s, ns, na, ss > ## with 3 Y variables: SAT, PPVT, Raven > ## > ## CanR CanRSQ Eigen percent cum > ## 1 0.6703 0.44934 0.81599 77.30 77.30 > ## 2 0.3837 0.14719 0.17260 16.35 93.65 > ## 3 0.2506 0.06282 0.06704 6.35 100.00 > ## > ## Test of H0: The canonical correlations in the > ## current row and all that follow are zero > ## > ... > } > > > -- > Michael Friendly Email: friendly AT yorku DOT ca > Professor, Psychology Dept. & Chair, Quantitative Methods > York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 > 4700 Keele Street Web: http://www.datavis.ca > Toronto, ONT M3J 1P3 CANADA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Michael Friendly
2013-Jan-29 16:31 UTC
[R] how to suppress the intercept in an lm()-like formula method?
On 1/29/2013 10:11 AM, John Fox wrote:> Hi Michael, > > OK -- I see -- you need to do more than fix up the model matrix. > > How about this? > > formula <- update(formula, . ~ . - 1) > cl <- match.call() > cl$formula <- formula > mf <- match.call(expand.dots = FALSE) > mf$formula <- formula > > Best, > JohnThat's the cleverest solution I've seen. It fixes it early, and avoids later complications / kludges. Thanks very much for this. I'm cc'ing R-help for posterity. best, -Michael> >> -----Original Message----- >> From: Michael Friendly [mailto:friendly at yorku.ca] >> Sent: Tuesday, January 29, 2013 9:36 AM >> To: John Fox >> Subject: Re: [R] how to suppress the intercept in an lm()-like formula >> method? >> >> On 1/29/2013 9:23 AM, John Fox wrote: >>> Hi Michael, >>> >>> How about, >>> >>> x <- x[, colnames(x) != "(Intercept)"] >>> >> Thanks, John >> See my followup post. I'm now using >> >> remove.intercept <- function(x) { >> if (colnames(x)[1] == "(Intercept)") { >> x <- x[,-1] >> attr(x, "assign") <- attr(x, "assign")[-1] >> } >> x >> } >> which works, but other objects in the function (mt & mf) are silently >> wrong, still including the intercept. >> >> -- >> Michael Friendly Email: friendly AT yorku DOT ca >> Professor, Psychology Dept. & Chair, Quantitative Methods >> York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 >> 4700 Keele Street Web: http://www.datavis.ca >> Toronto, ONT M3J 1P3 CANADA >-- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA