Michael Friendly
2013-Jan-29 13:44 UTC
[R] how to suppress the intercept in an lm()-like formula method?
I'm trying to write a formula method for canonical correlation analysis,
that could be called similarly to lm() for
a multivariate response:
cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...)
or perhaps more naturally,
cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...)
I've adapted the code from lm() to my case, but in this situation, it
doesn't make sense to
include an intercept, since X & Y are mean centered by default in the
computation.
In the code below, I can't see where the intercept gets included in the
model matrix and therefore
how to suppress it. There is a test case at the end, showing that the
method fails when called
normally, but works if I explicitly use -1 in the formula. I could hack
the result of model.matrix(),
but maybe there's an easier way?
cancor <- function(x, ...) {
UseMethod("cancor", x)
}
cancor.default <- candisc:::cancor
# TODO: make cancisc::cancor() use x, y, not X, Y
cancor.formula <- function(formula, data, subset, weights,
na.action,
method = "qr",
model = TRUE,
x = FALSE, y = FALSE, qr = TRUE,
contrasts = NULL, ...) {
cl <- match.call()
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset",
"weights", "na.action"),
names(mf), 0L)
mf <- mf[c(1L, m)]
mf[[1L]] <- as.name("model.frame")
mf <- eval(mf, parent.frame())
mt <- attr(mf, "terms")
y <- model.response(mf, "numeric")
w <- as.vector(model.weights(mf))
if (!is.null(w) && !is.numeric(w))
stop("'weights' must be a numeric vector")
x <- model.matrix(mt, mf, contrasts)
# fixup to remove intercept???
z <- if (is.null(w))
cancor.default(x, y, ...)
else stop("weights are not yet implemented") # lm.wfit(x, y, w,
...)
z$call <- cl
z$terms <- mt
z
}
TESTME <- FALSE
if (TESTME) {
# need to get latest version, 0.6-1 from R-Forge
install.packages("candisc",
repo="http://R-Forge.R-project.org")
library(candisc)
data(Rohwer)
# this bombs: needs intercept removed
cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,
data=Rohwer)
## Error in chol.default(Rxx) :
## the leading minor of order 1 is not positive definite
#this works as is
cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ -1 + n + s + ns + na +
ss, data=Rohwer)
cc
## Canonical correlation analysis of:
## 5 X variables: n, s, ns, na, ss
## with 3 Y variables: SAT, PPVT, Raven
##
## CanR CanRSQ Eigen percent cum
## 1 0.6703 0.44934 0.81599 77.30 77.30
## 2 0.3837 0.14719 0.17260 16.35 93.65
## 3 0.2506 0.06282 0.06704 6.35 100.00
##
## Test of H0: The canonical correlations in the
## current row and all that follow are zero
##
...
}
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
Michael Friendly
2013-Jan-29 14:14 UTC
[R] how to suppress the intercept in an lm()-like formula method?
To partly answer my own question: It wasn't that hard to hack the
result of model.matrix() to remove the intercept,
remove.intercept <- function(x) {
if (colnames(x)[1] == "(Intercept)") {
x <- x[,-1]
attr(x, "assign") <- attr(x, "assign")[-1]
}
x
}
However, the model frame and therefore the model terms stored in the
object are wrong, still including the intercept:
Browse[1]> str(mt)
Classes 'terms', 'formula' length 3 cbind(SAT, PPVT, Raven) ~ n
+ s + ns
+ na + ss
..- attr(*, "variables")= language list(cbind(SAT, PPVT, Raven), n,
s, ns, na, ss)
..- attr(*, "factors")= int [1:6, 1:5] 0 1 0 0 0 0 0 0 1 0 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:6] "cbind(SAT, PPVT, Raven)" "n"
"s" "ns" ...
.. .. ..$ : chr [1:5] "n" "s" "ns"
"na" ...
..- attr(*, "term.labels")= chr [1:5] "n" "s"
"ns" "na" ...
..- attr(*, "order")= int [1:5] 1 1 1 1 1
..- attr(*, "intercept")= int 1
..- attr(*, "response")= int 1
..- attr(*, ".Environment")=<environment: R_GlobalEnv>
..- attr(*, "predvars")= language list(cbind(SAT, PPVT, Raven), n,
s,
ns, na, ss)
..- attr(*, "dataClasses")= Named chr [1:6] "nmatrix.3"
"numeric"
"numeric" "numeric" ...
.. ..- attr(*, "names")= chr [1:6] "cbind(SAT, PPVT,
Raven)" "n" "s"
"ns" ...
Browse[1]>
On 1/29/2013 8:44 AM, Michael Friendly wrote:> I'm trying to write a formula method for canonical correlation
analysis,
> that could be called similarly to lm() for
> a multivariate response:
>
> cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...)
> or perhaps more naturally,
> cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...)
>
> I've adapted the code from lm() to my case, but in this situation, it
> doesn't make sense to
> include an intercept, since X & Y are mean centered by default in the
> computation.
>
> In the code below, I can't see where the intercept gets included in the
> model matrix and therefore
> how to suppress it. There is a test case at the end, showing that the
> method fails when called
> normally, but works if I explicitly use -1 in the formula. I could hack
> the result of model.matrix(),
> but maybe there's an easier way?
>
> cancor <- function(x, ...) {
> UseMethod("cancor", x)
> }
>
> cancor.default <- candisc:::cancor
>
> # TODO: make cancisc::cancor() use x, y, not X, Y
> cancor.formula <- function(formula, data, subset, weights,
> na.action,
> method = "qr",
> model = TRUE,
> x = FALSE, y = FALSE, qr = TRUE,
> contrasts = NULL, ...) {
>
> cl <- match.call()
> mf <- match.call(expand.dots = FALSE)
> m <- match(c("formula", "data",
"subset", "weights", "na.action"),
> names(mf), 0L)
> mf <- mf[c(1L, m)]
>
> mf[[1L]] <- as.name("model.frame")
> mf <- eval(mf, parent.frame())
>
> mt <- attr(mf, "terms")
> y <- model.response(mf, "numeric")
> w <- as.vector(model.weights(mf))
> if (!is.null(w) && !is.numeric(w))
> stop("'weights' must be a numeric vector")
>
> x <- model.matrix(mt, mf, contrasts)
> # fixup to remove intercept???
> z <- if (is.null(w))
> cancor.default(x, y, ...)
> else stop("weights are not yet implemented") # lm.wfit(x,
y, w, ...)
>
> z$call <- cl
> z$terms <- mt
> z
> }
>
> TESTME <- FALSE
> if (TESTME) {
>
> # need to get latest version, 0.6-1 from R-Forge
> install.packages("candisc",
repo="http://R-Forge.R-project.org")
> library(candisc)
>
> data(Rohwer)
>
> # this bombs: needs intercept removed
> cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,
> data=Rohwer)
> ## Error in chol.default(Rxx) :
> ## the leading minor of order 1 is not positive definite
>
> #this works as is
> cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ -1 + n + s + ns + na +
> ss, data=Rohwer)
> cc
> ## Canonical correlation analysis of:
> ## 5 X variables: n, s, ns, na, ss
> ## with 3 Y variables: SAT, PPVT, Raven
> ##
> ## CanR CanRSQ Eigen percent cum
> ## 1 0.6703 0.44934 0.81599 77.30 77.30
> ## 2 0.3837 0.14719 0.17260 16.35 93.65
> ## 3 0.2506 0.06282 0.06704 6.35 100.00
> ##
> ## Test of H0: The canonical correlations in the
> ## current row and all that follow are zero
> ##
> ...
> }
>
>
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
John Fox
2013-Jan-29 14:23 UTC
[R] how to suppress the intercept in an lm()-like formula method?
Hi Michael,
How about,
x <- x[, colnames(x) != "(Intercept)"]
I hope this helps,
John
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org]
> On Behalf Of Michael Friendly
> Sent: Tuesday, January 29, 2013 8:45 AM
> To: R-help
> Subject: [R] how to suppress the intercept in an lm()-like formula
> method?
>
> I'm trying to write a formula method for canonical correlation
analysis,
> that could be called similarly to lm() for
> a multivariate response:
>
> cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...)
> or perhaps more naturally,
> cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...)
>
> I've adapted the code from lm() to my case, but in this situation, it
> doesn't make sense to
> include an intercept, since X & Y are mean centered by default in the
> computation.
>
> In the code below, I can't see where the intercept gets included in the
> model matrix and therefore
> how to suppress it. There is a test case at the end, showing that the
> method fails when called
> normally, but works if I explicitly use -1 in the formula. I could hack
> the result of model.matrix(),
> but maybe there's an easier way?
>
> cancor <- function(x, ...) {
> UseMethod("cancor", x)
> }
>
> cancor.default <- candisc:::cancor
>
> # TODO: make cancisc::cancor() use x, y, not X, Y
> cancor.formula <- function(formula, data, subset, weights,
> na.action,
> method = "qr",
> model = TRUE,
> x = FALSE, y = FALSE, qr = TRUE,
> contrasts = NULL, ...) {
>
> cl <- match.call()
> mf <- match.call(expand.dots = FALSE)
> m <- match(c("formula", "data",
"subset", "weights", "na.action"),
> names(mf), 0L)
> mf <- mf[c(1L, m)]
>
> mf[[1L]] <- as.name("model.frame")
> mf <- eval(mf, parent.frame())
>
> mt <- attr(mf, "terms")
> y <- model.response(mf, "numeric")
> w <- as.vector(model.weights(mf))
> if (!is.null(w) && !is.numeric(w))
> stop("'weights' must be a numeric vector")
>
> x <- model.matrix(mt, mf, contrasts)
> # fixup to remove intercept???
> z <- if (is.null(w))
> cancor.default(x, y, ...)
> else stop("weights are not yet implemented") # lm.wfit(x,
y, w,
> ...)
>
> z$call <- cl
> z$terms <- mt
> z
> }
>
> TESTME <- FALSE
> if (TESTME) {
>
> # need to get latest version, 0.6-1 from R-Forge
> install.packages("candisc",
repo="http://R-Forge.R-project.org")
> library(candisc)
>
> data(Rohwer)
>
> # this bombs: needs intercept removed
> cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,
> data=Rohwer)
> ## Error in chol.default(Rxx) :
> ## the leading minor of order 1 is not positive definite
>
> #this works as is
> cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ -1 + n + s + ns + na +
> ss, data=Rohwer)
> cc
> ## Canonical correlation analysis of:
> ## 5 X variables: n, s, ns, na, ss
> ## with 3 Y variables: SAT, PPVT, Raven
> ##
> ## CanR CanRSQ Eigen percent cum
> ## 1 0.6703 0.44934 0.81599 77.30 77.30
> ## 2 0.3837 0.14719 0.17260 16.35 93.65
> ## 3 0.2506 0.06282 0.06704 6.35 100.00
> ##
> ## Test of H0: The canonical correlations in the
> ## current row and all that follow are zero
> ##
> ...
> }
>
>
> --
> Michael Friendly Email: friendly AT yorku DOT ca
> Professor, Psychology Dept. & Chair, Quantitative Methods
> York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
> 4700 Keele Street Web: http://www.datavis.ca
> Toronto, ONT M3J 1P3 CANADA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
Michael Friendly
2013-Jan-29 16:31 UTC
[R] how to suppress the intercept in an lm()-like formula method?
On 1/29/2013 10:11 AM, John Fox wrote:> Hi Michael, > > OK -- I see -- you need to do more than fix up the model matrix. > > How about this? > > formula <- update(formula, . ~ . - 1) > cl <- match.call() > cl$formula <- formula > mf <- match.call(expand.dots = FALSE) > mf$formula <- formula > > Best, > JohnThat's the cleverest solution I've seen. It fixes it early, and avoids later complications / kludges. Thanks very much for this. I'm cc'ing R-help for posterity. best, -Michael> >> -----Original Message----- >> From: Michael Friendly [mailto:friendly at yorku.ca] >> Sent: Tuesday, January 29, 2013 9:36 AM >> To: John Fox >> Subject: Re: [R] how to suppress the intercept in an lm()-like formula >> method? >> >> On 1/29/2013 9:23 AM, John Fox wrote: >>> Hi Michael, >>> >>> How about, >>> >>> x <- x[, colnames(x) != "(Intercept)"] >>> >> Thanks, John >> See my followup post. I'm now using >> >> remove.intercept <- function(x) { >> if (colnames(x)[1] == "(Intercept)") { >> x <- x[,-1] >> attr(x, "assign") <- attr(x, "assign")[-1] >> } >> x >> } >> which works, but other objects in the function (mt & mf) are silently >> wrong, still including the intercept. >> >> -- >> Michael Friendly Email: friendly AT yorku DOT ca >> Professor, Psychology Dept. & Chair, Quantitative Methods >> York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 >> 4700 Keele Street Web: http://www.datavis.ca >> Toronto, ONT M3J 1P3 CANADA >-- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA