thr3ads.net - R help - [R] Regression with factor having1 level [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Robert McGehee

2016-Mar-11 01:03 UTC

[R] Regression with factor having1 level

Here's an example for clarity:
> df <- data.frame(y=c(0,2,4,6,8), x1=c(1,1,2,2,NA),
x2=factor(c("A","A","A","A","B")))> resid(lm(y~x1+x2, data=df, na.action=na.exclude)Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels

Note that the x2 factor variable contains two levels, but the "B"
level is
excluded in the regression due to the NA value in x1. Hence the error.

Instead of the above error, I would like a function that returns the
residual of the regression without the offending term, which in this case
would be equivalent to:> resid(lm(y~x1, data=df, na.action=na.exclude) 1  2  3  4  5
-1  1 -1  1 NA

Note the 5th term returns an NA as there is an NA in the x1 independent
variable, which was what I had meant by maintain NAs.

I'm currently leaning towards rewriting model.matrix.default so that it
removes offending terms rather than give an error, but if someone has done
this already (or something more elegant), that would of course be preferred
:)
--Robert

On Thu, Mar 10, 2016 at 7:39 PM, David Winsemius <dwinsemius at
comcast.net>
wrote:
>
> > On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcgehee at
gmail.com> wrote:
> >
> > Hello R-helpers,
> > I'd like a function that given an arbitrary formula and a data
frame
> > returns the residual of the dependent variable,and maintains all NA
> values.
>
> What does "maintains all NA values" actually mean?
> >
> > Here's an example that will give me what I want if my formula is
> y~x1+x2+x3
> > and my data frame is df:
> >
> > resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
> >
> > Here's the catch, I do not want my function to ever fail due to a
factor
> > with only one level. A one-level factor may appear because 1) the user
> > passed it in, or 2) (more common) only one factor in a term is left
after
> > na.exclude removes the other NA values.
> >
> > Here is the error I would get
>
> From what code?
>
>
> > above if one of the terms was a factor with
> > one level:
> > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
> >  contrasts can be applied only to factors with 2 or more levels
>
> Unable to create that error with the actions you decribe but to not
> actually offer in coded form:
>
>
> > dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE,
x3=rnorm(10))
> > lm(y~x1+x2+x3, dfrm)
>
> Call:
> lm(formula = y ~ x1 + x2 + x3, data = dfrm)
>
> Coefficients:
> (Intercept)           x1       x2TRUE           x3
>    -0.16274     -0.30032           NA     -0.09093
>
> > resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude))
>           1           2           3           4           5           6
> -0.16097245  0.65408508 -0.70098223 -0.15360434  1.26027872  0.55752239
>           7           8           9          10
> -0.05965653 -2.17480605  1.42917190 -0.65103650
>
> >
>
>
> > Instead of giving me an error, I'd like the function to do just
what lm()
> > normally does when it sees a variable with no variance, ignore the
> variable
> > (coefficient is NA) and continue to regress out all the other
variables.
> > Thus if 'x2' is a factor with one variable in the above
example, I'd like
> > the function to return the result of:
> > resid(lm(y~x1+x3, data=df, na.action=na.exclude))
> > Can anyone provide me a straight forward recommendation for how to do
> this?
> > I feel like it should be easy, but I'm honestly stuck, and my
Google
> > searching for this hasn't gotten anywhere. The key is that I'd
like the
> > solution to be generic enough to work with an arbitrary linear
formula,
> and
> > not substantially kludgy (like trying ever combination of regressions
> terms
> > until one works) as I'll be running this a lot on big data sets
and don't
> > want my computation time swamped by running unnecessary regressions or
> > checking for number of factors after removing NAs.
> >
> > Thanks in advance!
> > --Robert
> >
> >
> > PS. The Google search feature in the R-help archives appears to be
down:
> > http://tolstoy.newcastle.edu.au/R/
>
> It's working for me.
>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>
	[[alternative HTML version deleted]]

peter dalgaard

2016-Mar-11 08:40 UTC

head link

[R] Regression with factor having1 level

> On 11 Mar 2016, at 02:03 , Robert McGehee <rmcgehee at gmail.com>
wrote:
> 
>> df <- data.frame(y=c(0,2,4,6,8), x1=c(1,1,2,2,NA),
>
x2=factor(c("A","A","A","A","B")))
>> resid(lm(y~x1+x2, data=df, na.action=na.exclude)
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Robert McGehee

2016-Mar-11 15:03 UTC

head link

[R] Regression with factor having1 level

Hi,
In case this is helpful for anyone, I think I've coded a satisfactory
function answering my problem (of handling formulas containing 1-level
factors) by hacking liberally at the model.matrix code to remove any
model terms for which the contrast fails. As it's a problem I've come
across a lot (since my data frames have factors and lots of missing
values), adding support for 1-level factors might be a nice item for
the R Wishlist. I suppose a key question is, does anyone ever _want_
to see the error "contrasts can be applied only to factors with 2 or
more levels", or should the contrasts function just add a column of
all zeros (or ones) to the design matrix and let the modelling
functions handle that the same way it does any other zero-variance
term?

Anyway, my function below:

lmresid <- function(formula, data) {
    mf <- model.frame(formula, data=data, na.action=na.exclude)
    omit <- attr(mf, "na.action")
    t <- terms(mf)
    contr.funs <- as.character(getOption("contrasts"))
    namD <- names(mf)
    for (i in namD) if (is.character(mf[[i]]))
        mf[[i]] <- factor(mf[[i]])
    isF <- vapply(mf, function(x) is.factor(x) || is.logical(x), NA)
    isF[1] <- FALSE
    isOF <- vapply(mf, is.ordered, NA)
    for (nn in namD[isF])
        if (is.null(attr(mf[[nn]], "contrasts"))) {
            noCntr <- try(contrasts(mf[[nn]]) <- contr.funs[1 +
isOF[nn]], silent=TRUE)
            if (inherits(noCntr, "try-error")) {       # Remove term
from model on error
                mf[[nn]] <- NULL
                t <- terms(update(t, as.formula(paste("~ . -",
nn))), data=mf)
            }
        }
    ans <- .External2(stats:::C_modelmatrix, t, mf)
    r   <- .lm.fit(ans, mf[[1]])$residual
    stats:::naresid.exclude(omit, r)
}

## Note that lmresid now returns the same values as resid with the
## 1-level factor removed.
df <- data.frame(y=c(0,2,4,6,8), x1=c(1,1,2,2,NA),
x2=factor(c("A","A","A","A","B")))
lmresid(y~x1+x2, data=df)
resid(lm(y~x1, data=df, na.action=na.exclude))

--Robert

PS, Peter, wasn't sure if you also meant to add comments, but they
didn't come through.

On Fri, Mar 11, 2016 at 3:40 AM, peter dalgaard <pdalgd at gmail.com>
wrote:>
>> On 11 Mar 2016, at 02:03 , Robert McGehee <rmcgehee at gmail.com>
wrote:
>>
>>> df <- data.frame(y=c(0,2,4,6,8), x1=c(1,1,2,2,NA),
>>
x2=factor(c("A","A","A","A","B")))
>>> resid(lm(y~x1+x2, data=df, na.action=na.exclude)
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
>
>
>
>
>
>
>
>

R help - Mar 2016 - Regression with factor having1 level

[R] Regression with factor having1 level

[R] Regression with factor having1 level

[R] Regression with factor having1 level