thr3ads.net - R help - [R] R's basic lm() and summary.lm functions [May 2013]

If this information is useful, please help other people find it:
Share via:

ivo welch

2013-May-10 22:46 UTC

[R] R's basic lm() and summary.lm functions

dear R experts:

I am contemplating the logic in R's lm() and summary(lm()) statements.
 the reason is that I want to extend the functionality of lm to give
me both standardized coefficients and newey-west standard errors and
Ts.  I have the code and can stick it at the end of the lm() function
(and for others, I have included my amateurish coding below)---it
works.  the big advantage of having the code directly in lm() is that
it frees me from having to reconstruct the X matrix, which is not
trivial given aliases, cross-terms, etc.  but I now have some more
philosophical questions.  what is the "R way" of extending linear
models?  and why is not everything from the summary.lm object in the
lm object to begin with?

I first thought that lm() is trying to store only the basic
information, summary.lm() adds to it, and print.summary.lm() formats
it for the screen.  but this is not really true.
> x=rnorm(1000); y=rnorm(1000); z=rnorm(1000)
> object.size( lm( y ~ x*z ))
271552 bytes> object.size( summary(lm( y ~ x*z )))73912 bytes

the summary.lm object carries less information than the lm object.
looking at the code in summary.lm(), there does not seem to be
anything that would not easily fit into lm() itself.  moreover,
summary.lm must work without x=TRUE in the lm() invocation. so, doing
(my) calculation inside summary.lm() seems harder than doing it inside
lm().

in this case, why not have the contents of summary.lm() inside lm() to
begin with?  it would eliminate one layer of complexity.  if the user
does not want standard errors because calculation could take too much
time (why? where?), then the lm() function could have a
standard.errors=TRUE argument.

so, primarily I don't understand the logic why R has both an lm and a
summary.lm object.  secondarily, I don't understand why my own
additions should not go into lm(), given that I think the the other
coefficients (standard errors, etc.) should, too.  third, should I
worry that R's built-in lm() function could change so I should not
replace it?  (without a hook in the end, I don't think lm() is easy to
wrap.  I need the X matrix.  I could write a wrapper function to
invoke lm() with x=TRUE, then do what I need to do, and then remove
'$x' from the lm object.  I would just carry more bytes along during
my calculations.)

I hope I am not imposing too much here...

/iaw

--------------

## standardized coefficients, newey-west standard errors, and a hook
for further enhancements

    if (stdcoefs) z$stdcoefs <- z$coefficients*apply(x,2,sd)/sd(mf$y)

    if (newey.west>=0) {

      x.na.omitted <- x
      r.na.omitted <- residuals(z)
      diagband.matrix <- function(m, ar.terms) {
        nomit <- m - ar.terms - 1
        mm <- matrix(TRUE, nrow = m, ncol = m)
        mm[1:nomit, (ncol(mm) - nomit + 1):ncol(mm)] <-
(lower.tri(matrix(TRUE, nrow = nomit, ncol = nomit)))
        mm[(ncol(mm) - nomit + 1):ncol(mm), 1:nomit] <-
(upper.tri(matrix(TRUE, nrow = nomit, ncol = nomit)))
        mm
      }
      invx <- chol2inv(chol(crossprod(x.na.omitted)))
      invx.x <- invx %*% t(x.na.omitted)
      if (newey.west==0)
        resid.matrix <- diag(r.na.omitted^2)
      else {
        full <- r.na.omitted %*% t(r.na.omitted)
        maskmatrix <- diagband.matrix(length(r.na.omitted), newey.west)
        resid.matrix <- full * maskmatrix
      }
      vmat <- invx.x %*% resid.matrix %*% t(invx.x)

      z$nw <- newey.west  ## the number of AR terms
      z$nw.se <- sqrt(diag(vmat))  ## the standard errors
    }

    if (!is.null(lm.hook)) lm.hook()  ## has access to everything that
lm() has already computed

----
Ivo Welch (ivo.welch at gmail.com)

ivo welch

2013-May-10 23:44 UTC

head link

[R] R's basic lm() and summary.lm functions

I ended up wrapping my own new "ols()" function in the end.  it is my
replacement for lm() and summary.lm.  this way, I don't need to alter
internals.  in case someone else needs it, it is included.  of course,
feel free to ignore.


docs[["ols"]] <- c(Rd= '
@TITLE ols.R
@AUTHOR ivo.welch at gmail.com
@DATE 2013
@DESCRIPTION
  adds newey-west and stdandardized coefficients to the lm function,
and adds the summary.lm information at the same time.
@USAGE ols(..., newey.west=0, stdcoefs=TRUE)
@ARGUMENTS
@DETAILS
@SEEALSO
@EXAMPLES
', test= '
  x <- rnorm(12); y <- rnorm(12); z <- rnorm(12); x[2] <-NA;
  ols( y ~ x + z )
', changes= '
')

ols <- function (..., x = FALSE, newey.west=(0), stdcoefs=TRUE) {

  ## R is painfully error-tolerant. I prefer reasonable and immediate
error warnings.
  stopifnot(
(is.vector(newey.west))&(length(newey.west)==1)|(is.numeric(newey.west))
)
  stopifnot(
(is.vector(stdcoefs))&(length(stdcoefs)==1)|(is.logical(stdcoefs))
)
  stopifnot( (is.vector(x))&(length(x)==1)|(is.logical(x)) )
  ## I wish I could check lm()'s argument, but I cannot.

  lmo <- lm(..., x=TRUE)
  ## note that both the x matrix and the residuals from the model have
their NA's omitted by default

  if (newey.west>=0) {
    resids <- residuals(lmo)
    diagband.matrix <- function(m, ar.terms) {
      nomit <- m - ar.terms - 1
      mm <- matrix(TRUE, nrow = m, ncol = m)
      mm[1:nomit, (ncol(mm) - nomit + 1):ncol(mm)] <-
(lower.tri(matrix(TRUE, nrow = nomit, ncol = nomit)))
      mm[(ncol(mm) - nomit + 1):ncol(mm), 1:nomit] <-
(upper.tri(matrix(TRUE, nrow = nomit, ncol = nomit)))
      mm
    }
    invx <- chol2inv(chol(crossprod(lmo$x)))
    invx.x <- invx %*% t(lmo$x)
    if (newey.west==0)
      resid.matrix <- diag(resids^2)
    else {
      full <- resids %*% t(resids)
      maskmatrix <- diagband.matrix(length(resids), newey.west)
      resid.matrix <- full * maskmatrix
    }
    vmat <- invx.x %*% resid.matrix %*% t(invx.x)

    nw <- newey.west  ## the number of AR terms
    nw.se <- sqrt(diag(vmat))  ## the standard errors
  }

  if (stdcoefs) stdcoefs.v <-
lmo$coefficients*apply(lmo$x,2,sd)/sd(lmo$model$y)

  full.x.matrix <- if (x) lmo$x else NULL
  lmo <- summary(lmo)  ## the summary.lm object
  if (x) lmo$x <- full.x.matrix

  if (stdcoefs) {
    lmo$coefficients <- cbind(lmo$coefficients, stdcoefs.v )
    colnames(lmo$coefficients)[ncol(lmo$coefficients)] <-
"stdcoefs"
  }

  if (newey.west>=0) {
    lmo$coefficients <- cbind(lmo$coefficients, nw.se)
    colnames(lmo$coefficients)[ncol(lmo$coefficients)] <-
paste0("se.nw(",newey.west,")")
    lmo$coefficients <- cbind(lmo$coefficients, lmo$coefficients[,1]/nw.se)
    colnames(lmo$coefficients)[ncol(lmo$coefficients)] <-
paste0("Tse.nw(",newey.west,")")
  }

  lmo
}

R help - May 2013 - R's basic lm() and summary.lm functions

[R] R's basic lm() and summary.lm functions

[R] R's basic lm() and summary.lm functions