> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David > Winsemius > Sent: Thursday, March 10, 2016 4:39 PM > To: Robert McGehee > Cc: r-help at r-project.org > Subject: Re: [R] Regression with factor having1 level > > > > On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcgehee at gmail.com> > wrote: > > > > Hello R-helpers, > > I'd like a function that given an arbitrary formula and a data frame > > returns the residual of the dependent variable,and maintains all NA values. > > What does "maintains all NA values" actually mean? > > > > Here's an example that will give me what I want if my formula is > > y~x1+x2+x3 and my data frame is df: > > > > resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude)) > > > > Here's the catch, I do not want my function to ever fail due to a > > factor with only one level. A one-level factor may appear because 1) > > the user passed it in, or 2) (more common) only one factor in a term > > is left after na.exclude removes the other NA values. > > > > Here is the error I would get > > From what code? > > > > above if one of the terms was a factor with one level: > > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : > > contrasts can be applied only to factors with 2 or more levels > > Unable to create that error with the actions you decribe but to not actually > offer in coded form: > > > > dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10)) > > lm(y~x1+x2+x3, dfrm) > > Call: > lm(formula = y ~ x1 + x2 + x3, data = dfrm) > > Coefficients: > (Intercept) x1 x2TRUE x3 > -0.16274 -0.30032 NA -0.09093 > > > resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude)) > 1 2 3 4 5 6 > -0.16097245 0.65408508 -0.70098223 -0.15360434 1.26027872 0.55752239 > 7 8 9 10 > -0.05965653 -2.17480605 1.42917190 -0.65103650 > > > > > > > Instead of giving me an error, I'd like the function to do just what > > lm() normally does when it sees a variable with no variance, ignore > > the variable (coefficient is NA) and continue to regress out all the other > variables. > > Thus if 'x2' is a factor with one variable in the above example, I'd > > like the function to return the result of: > > resid(lm(y~x1+x3, data=df, na.action=na.exclude)) Can anyone provide > > me a straight forward recommendation for how to do this? > > I feel like it should be easy, but I'm honestly stuck, and my Google > > searching for this hasn't gotten anywhere. The key is that I'd like > > the solution to be generic enough to work with an arbitrary linear > > formula, and not substantially kludgy (like trying ever combination of > > regressions terms until one works) as I'll be running this a lot on > > big data sets and don't want my computation time swamped by running > > unnecessary regressions or checking for number of factors after removing > NAs. > > > > Thanks in advance! > > --Robert > > > > > > PS. The Google search feature in the R-help archives appears to be down: > > http://tolstoy.newcastle.edu.au/R/ > > It's working for me. > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA >I agree that what is wanted is not clear. However, if dfrm is created with x2 as a factor, then you get the error message that the OP mentions when you run the regression.> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10)) > lm(y~x1+x2+x3, dfrm, na.action=na.exclude)Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied Dan Daniel Nordlund, PhD Research and Data Analysis Division Services & Enterprise Support Administration Washington State Department of Social and Health Services
> On Mar 10, 2016, at 5:45 PM, Nordlund, Dan (DSHS/RDA) <NordlDJ at dshs.wa.gov> wrote: > >> -----Original Message----- >> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David >> Winsemius >> Sent: Thursday, March 10, 2016 4:39 PM >> To: Robert McGehee >> Cc: r-help at r-project.org >> Subject: Re: [R] Regression with factor having1 level >> >> >>> On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcgehee at gmail.com> >> wrote: >>> >>> Hello R-helpers, >>> I'd like a function that given an arbitrary formula and a data frame >>> returns the residual of the dependent variable,and maintains all NA values. >> >> What does "maintains all NA values" actually mean? >>> >>> Here's an example that will give me what I want if my formula is >>> y~x1+x2+x3 and my data frame is df: >>> >>> resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude)) >>> >>> Here's the catch, I do not want my function to ever fail due to a >>> factor with only one level. A one-level factor may appear because 1) >>> the user passed it in, or 2) (more common) only one factor in a term >>> is left after na.exclude removes the other NA values. >>> >>> Here is the error I would get >> >> From what code? >> >> >>> above if one of the terms was a factor with one level: >>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : >>> contrasts can be applied only to factors with 2 or more levels >> >> Unable to create that error with the actions you decribe but to not actually >> offer in coded form: >> >> >>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10)) >>> lm(y~x1+x2+x3, dfrm) >> >> Call: >> lm(formula = y ~ x1 + x2 + x3, data = dfrm) >> >> Coefficients: >> (Intercept) x1 x2TRUE x3 >> -0.16274 -0.30032 NA -0.09093 >> >>> resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude)) >> 1 2 3 4 5 6 >> -0.16097245 0.65408508 -0.70098223 -0.15360434 1.26027872 0.55752239 >> 7 8 9 10 >> -0.05965653 -2.17480605 1.42917190 -0.65103650 >> >>> >> >> >>> Instead of giving me an error, I'd like the function to do just what >>> lm() normally does when it sees a variable with no variance, ignore >>> the variable (coefficient is NA) and continue to regress out all the other >> variables. >>> Thus if 'x2' is a factor with one variable in the above example, I'd >>> like the function to return the result of: >>> resid(lm(y~x1+x3, data=df, na.action=na.exclude)) Can anyone provide >>> me a straight forward recommendation for how to do this? >>> I feel like it should be easy, but I'm honestly stuck, and my Google >>> searching for this hasn't gotten anywhere. The key is that I'd like >>> the solution to be generic enough to work with an arbitrary linear >>> formula, and not substantially kludgy (like trying ever combination of >>> regressions terms until one works) as I'll be running this a lot on >>> big data sets and don't want my computation time swamped by running >>> unnecessary regressions or checking for number of factors after removing >> NAs. >>> >>> Thanks in advance! >>> --Robert >>> >>> >>> PS. The Google search feature in the R-help archives appears to be down: >>> http://tolstoy.newcastle.edu.au/R/ >> >> It's working for me. >> >>> >>> [[alternative HTML version deleted]] >>> >> >> David Winsemius >> Alameda, CA, USA >> > > I agree that what is wanted is not clear. However, if dfrm is created with x2 as a factor, then you get the error message that the OP mentions when you run the regression. > >> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10)) >> lm(y~x1+x2+x3, dfrm, na.action=na.exclude) > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : > contrasts can be appliedYes, and the error appears to come from `model.matrix`:> model.matrix(y~x1+factor(x2)+x3, dfrm)Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels> model.matrix(y~x1+x2+x3, dfrm)(Intercept) x1 x2TRUE x3 1 1 0.04887847 1 -0.4199628 2 1 -1.04786688 1 1.3947923 3 1 -0.34896007 1 -2.1873666 4 1 -0.08866061 1 0.1204129 5 1 -0.41111366 1 -1.6631057 6 1 -0.83449110 1 1.1631801 7 1 -0.67887823 1 0.3207544 8 1 -1.12206068 1 0.6012040 9 1 0.05116683 1 0.3598696 10 1 1.74413583 1 0.3608478 attr(,"assign") [1] 0 1 2 3 attr(,"contrasts") attr(,"contrasts")$x2 [1] "contr.treatment" -- David Winsemius Alameda, CA, USA
> On 11 Mar 2016, at 08:25 , David Winsemius <dwinsemius at comcast.net> wrote: >>...>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10)) >>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude) >> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : >> contrasts can be applied > > Yes, and the error appears to come from `model.matrix`: > >> model.matrix(y~x1+factor(x2)+x3, dfrm) > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : > contrasts can be applied only to factors with 2 or more levels >Actually not. The above is because you use an explicit factor(x2). The actual smoking gun is this line in lm() mf$drop.unused.levels <- TRUE which someone must have thought was a good idea at some point.... model.matrix itself is quite happy to leave factors alone and let subsequent code sort out any singularities, e.g.> model.matrix(y~x1+x2, data=df[1:2,])(Intercept) x1 x2B 1 1 1 0 2 1 1 0 attr(,"assign") [1] 0 1 2 attr(,"contrasts") attr(,"contrasts")$x2 [1] "contr.treatment">> model.matrix(y~x1+x2+x3, dfrm) > (Intercept) x1 x2TRUE x3 > 1 1 0.04887847 1 -0.4199628 > 2 1 -1.04786688 1 1.3947923 > 3 1 -0.34896007 1 -2.1873666 > 4 1 -0.08866061 1 0.1204129 > 5 1 -0.41111366 1 -1.6631057 > 6 1 -0.83449110 1 1.1631801 > 7 1 -0.67887823 1 0.3207544 > 8 1 -1.12206068 1 0.6012040 > 9 1 0.05116683 1 0.3598696 > 10 1 1.74413583 1 0.3608478 > attr(,"assign") > [1] 0 1 2 3 > attr(,"contrasts") > attr(,"contrasts")$x2 > [1] "contr.treatment" > > -- > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com