> On 11 Mar 2016, at 17:56 , David Winsemius <dwinsemius at comcast.net> wrote: > >> >> On Mar 11, 2016, at 12:48 AM, peter dalgaard <pdalgd at gmail.com> wrote: >> >> >>> On 11 Mar 2016, at 08:25 , David Winsemius <dwinsemius at comcast.net> wrote: >>>> >> ... >>>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10)) >>>>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude) >>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : >>>> contrasts can be applied >>> >>> Yes, and the error appears to come from `model.matrix`: >>> >>>> model.matrix(y~x1+factor(x2)+x3, dfrm) >>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : >>> contrasts can be applied only to factors with 2 or more levels >>> >> >> Actually not. The above is because you use an explicit factor(x2). The actual smoking gun is this line in lm() >> >> mf$drop.unused.levels <- TRUE > > It's possible that modifying model.matrix to allow single level factors would then bump up against that check, but at the moment the traceback() from an error generated with data that has a single level factor and no call to factor in the formula still implicates code in model.matrix:You're missing the point: model.matrix has a beef with 1-level factors, not with 2-level factors of which one level happens to be absent, which is what this thread was originally about. It is lm that via model.frame with drop.unused.levels=TRUE converts the latter factors to the former. -pd> >> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=factor(TRUE), x3=rnorm(10)) >> lm(y~x1+x2+x3, dfrm) > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : > contrasts can be applied only to factors with 2 or more levels >> traceback() > 5: stop("contrasts can be applied only to factors with 2 or more levels") > 4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) > 3: model.matrix.default(mt, mf, contrasts) > 2: model.matrix(mt, mf, contrasts) > 1: lm(y ~ x1 + x2 + x3, dfrm) > > -- > David. > >> >> which someone must have thought was a good idea at some point.... >> >> model.matrix itself is quite happy to leave factors alone and let subsequent code sort out any singularities, e.g. >> >>> model.matrix(y~x1+x2, data=df[1:2,]) >> (Intercept) x1 x2B >> 1 1 1 0 >> 2 1 1 0 >> attr(,"assign") >> [1] 0 1 2 >> attr(,"contrasts") >> attr(,"contrasts")$x2 >> [1] "contr.treatment" >> >> >> >>>> model.matrix(y~x1+x2+x3, dfrm) >>> (Intercept) x1 x2TRUE x3 >>> 1 1 0.04887847 1 -0.4199628 >>> 2 1 -1.04786688 1 1.3947923 >>> 3 1 -0.34896007 1 -2.1873666 >>> 4 1 -0.08866061 1 0.1204129 >>> 5 1 -0.41111366 1 -1.6631057 >>> 6 1 -0.83449110 1 1.1631801 >>> 7 1 -0.67887823 1 0.3207544 >>> 8 1 -1.12206068 1 0.6012040 >>> 9 1 0.05116683 1 0.3598696 >>> 10 1 1.74413583 1 0.3608478 >>> attr(,"assign") >>> [1] 0 1 2 3 >>> attr(,"contrasts") >>> attr(,"contrasts")$x2 >>> [1] "contr.treatment" >>> >>> -- >>> >>> David Winsemius >>> Alameda, CA, USA >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Peter Dalgaard, Professor, >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Office: A 4.23 >> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >> >> >> >> >> >> >> >> >> > > David Winsemius > Alameda, CA, USA-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
> On Mar 11, 2016, at 2:07 PM, peter dalgaard <pdalgd at gmail.com> wrote: > > >> On 11 Mar 2016, at 17:56 , David Winsemius <dwinsemius at comcast.net> wrote: >> >>> >>> On Mar 11, 2016, at 12:48 AM, peter dalgaard <pdalgd at gmail.com> wrote: >>> >>> >>>> On 11 Mar 2016, at 08:25 , David Winsemius <dwinsemius at comcast.net> wrote: >>>>> >>> ... >>>>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10)) >>>>>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude) >>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : >>>>> contrasts can be applied >>>> >>>> Yes, and the error appears to come from `model.matrix`: >>>> >>>>> model.matrix(y~x1+factor(x2)+x3, dfrm) >>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : >>>> contrasts can be applied only to factors with 2 or more levels >>>> >>> >>> Actually not. The above is because you use an explicit factor(x2). The actual smoking gun is this line in lm() >>> >>> mf$drop.unused.levels <- TRUE >> >> It's possible that modifying model.matrix to allow single level factors would then bump up against that check, but at the moment the traceback() from an error generated with data that has a single level factor and no call to factor in the formula still implicates code in model.matrix: > > You're missing the point: model.matrix has a beef with 1-level factors, not with 2-level factors of which one level happens to be absent, which is what this thread was originally about. It is lm that via model.frame with drop.unused.levels=TRUE converts the latter factors to the former. >I guess I did miss the point. Apologies for being obtuse. I thought that a one level factor would have been "aliased out" when model.matrix "realized" that it was collinear with the intercept. (Further apologies for my projection of cognitive capacites on a machine.) Are you saying it remains desirable that an error be thrown rather than reporting an NA for coefficients and issuing a warning? -- David.> -pd > > >> >>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=factor(TRUE), x3=rnorm(10)) >>> lm(y~x1+x2+x3, dfrm) >> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : >> contrasts can be applied only to factors with 2 or more levels >>> traceback() >> 5: stop("contrasts can be applied only to factors with 2 or more levels") >> 4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) >> 3: model.matrix.default(mt, mf, contrasts) >> 2: model.matrix(mt, mf, contrasts) >> 1: lm(y ~ x1 + x2 + x3, dfrm) >> >> -- >> David. >> >>> >>> which someone must have thought was a good idea at some point.... >>> >>> model.matrix itself is quite happy to leave factors alone and let subsequent code sort out any singularities, e.g. >>> >>>> model.matrix(y~x1+x2, data=df[1:2,]) >>> (Intercept) x1 x2B >>> 1 1 1 0 >>> 2 1 1 0 >>> attr(,"assign") >>> [1] 0 1 2 >>> attr(,"contrasts") >>> attr(,"contrasts")$x2 >>> [1] "contr.treatment" >>> >>> >>> >>>>> model.matrix(y~x1+x2+x3, dfrm) >>>> (Intercept) x1 x2TRUE x3 >>>> 1 1 0.04887847 1 -0.4199628 >>>> 2 1 -1.04786688 1 1.3947923 >>>> 3 1 -0.34896007 1 -2.1873666 >>>> 4 1 -0.08866061 1 0.1204129 >>>> 5 1 -0.41111366 1 -1.6631057 >>>> 6 1 -0.83449110 1 1.1631801 >>>> 7 1 -0.67887823 1 0.3207544 >>>> 8 1 -1.12206068 1 0.6012040 >>>> 9 1 0.05116683 1 0.3598696 >>>> 10 1 1.74413583 1 0.3608478 >>>> attr(,"assign") >>>> [1] 0 1 2 3 >>>> attr(,"contrasts") >>>> attr(,"contrasts")$x2 >>>> [1] "contr.treatment" >>>> >>>> -- >>>> >>>> David Winsemius >>>> Alameda, CA, USA >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> -- >>> Peter Dalgaard, Professor, >>> Center for Statistics, Copenhagen Business School >>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>> Phone: (+45)38153501 >>> Office: A 4.23 >>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> David Winsemius >> Alameda, CA, USA > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > >David Winsemius Alameda, CA, USA
> On 11 Mar 2016, at 23:48 , David Winsemius <dwinsemius at comcast.net> wrote: > >> >> On Mar 11, 2016, at 2:07 PM, peter dalgaard <pdalgd at gmail.com> wrote: >> >> >>> On 11 Mar 2016, at 17:56 , David Winsemius <dwinsemius at comcast.net> wrote: >>> >>>> >>>> On Mar 11, 2016, at 12:48 AM, peter dalgaard <pdalgd at gmail.com> wrote: >>>> >>>> >>>>> On 11 Mar 2016, at 08:25 , David Winsemius <dwinsemius at comcast.net> wrote: >>>>>> >>>> ... >>>>>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10)) >>>>>>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude) >>>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : >>>>>> contrasts can be applied >>>>> >>>>> Yes, and the error appears to come from `model.matrix`: >>>>> >>>>>> model.matrix(y~x1+factor(x2)+x3, dfrm) >>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : >>>>> contrasts can be applied only to factors with 2 or more levels >>>>> >>>> >>>> Actually not. The above is because you use an explicit factor(x2). The actual smoking gun is this line in lm() >>>> >>>> mf$drop.unused.levels <- TRUE >>> >>> It's possible that modifying model.matrix to allow single level factors would then bump up against that check, but at the moment the traceback() from an error generated with data that has a single level factor and no call to factor in the formula still implicates code in model.matrix: >> >> You're missing the point: model.matrix has a beef with 1-level factors, not with 2-level factors of which one level happens to be absent, which is what this thread was originally about. It is lm that via model.frame with drop.unused.levels=TRUE converts the latter factors to the former. >> > > I guess I did miss the point. Apologies for being obtuse. I thought that a one level factor would have been "aliased out" when model.matrix "realized" that it was collinear with the intercept. (Further apologies for my projection of cognitive capacites on a machine.) Are you saying it remains desirable that an error be thrown rather than reporting an NA for coefficients and issuing a warning? >For the moment I was just analyzing where this came from. Intuitively I'd be leaning in the opposite direction -- dropping factor levels automatically is usually a bad thing. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com