peter dalgaard
2015-Nov-20 09:57 UTC
[R] predict() works with the design matrix but throws error with some rows of that matrix
On 20 Nov 2015, at 10:07 , peter dalgaard <pdalgd at gmail.com> wrote:>> >> On 20 Nov 2015, at 04:53 , Damir Cosic <damir.cosic at gmail.com> wrote: >> >> Hello, >> >> I am having problems with predict() after a multinomial logit regression by >> multinom(). I generate a design matrix with model.matrix() and use it to >> estimate the model. Then, if I pass the entire design matrix to predict(), >> it returns the same output as fitted(), which is expected. But if I pass >> only a few rows of the design matrix, it throws this error: >> >> Error in model.frame.default(Terms, newdata, na.action = na.omit, xlev >> = object$xlevels) : variable lengths differ (found for 'z') In addition: >> >> Warning message: 'newdata' had 6 rows but variables found have 15 rows > > Offhand (sorry, no time for testing things this morning) I suspect that you are mixing paradigms. You can _either_ multiply coefficients with a design matrix _or_ look up variables in a data frame, and I think you are trying to look up variables in a matrix. In particular, I don't expect mm to have a column called "z".A little further thought later: The crux is that matrices will double as data frames in the newdata argument, but it will only work for numerical variables. Your model contains a numeric and a factor variable so it won't work, for two reasons:> head(mm)x ztall 1 -2.4963581 0 2 0.9895450 1 3 1.8755237 1 4 0.8911458 0 5 -2.1458457 1 6 0.6294571 1 I.e., there is no "z" column, but even if there were, it would mismatch with the model> colnames(mm)[2] <- "z" > p2<-predict(m,head(mm),"probs")Error: variable 'z' was fitted with type "factor" but type "numeric" was supplied In addition: Warning message: In model.frame.default(Terms, newdata, na.action = na.omit, xlev = object$xlevels) : variable 'z' is not a factor At this point, newdata=mm will fail too, as I predicted. (Pun almost unintended.) Notice that this works fine:> p2<-predict(m,head(df),"probs") > p2good bad ugly 1 9.998923e-01 8.171733e-05 2.598999e-05 2 2.804424e-05 4.170214e-01 5.829506e-01 3 2.892377e-05 3.255878e-01 6.743832e-01 4 9.999315e-01 2.818836e-05 4.031584e-05 5 1.863116e-05 7.420142e-01 2.579672e-01 6 2.740359e-05 4.563101e-01 5.436625e-01>-pd> > Accordingly, neither of your examples actually work, both cases find z (and x?) in the global environment, it is just only in the latter example that the inconsistency is discovered. I think you want either to use model.frame or an explicit mm %*% coef(model) (or thereabouts). > > >> >> This is a minimal example: >> >> require(nnet) >> >> y<-factor(rep(c(1,2,3),5), levels=1:3, labels=c("good","bad","ugly")) >> x<-rnorm(15)+.2*rep(1:3,5) >> z<-factor(rep(c(1,2,2),5), levels=1:2, labels=c("short","tall")) >> >> df<-data.frame(y=y, x=x, z=z) >> mm<-model.matrix(~x+z, data=df)[,2:3] >> m<-multinom(y ~ x+z, data=df) >> >> p1<-predict(m,mm,"probs") >> >> p2<-predict(m,head(mm),"probs") >> >> My actual goal is out-of-sample prediction, but I could not make it work >> and, while debugging it, I reduced it to this problem. >> >> Best, >> >> Damir >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Damir Cosic
2015-Nov-20 16:17 UTC
[R] predict() works with the design matrix but throws error with some rows of that matrix
Peter, this is extremely helpful. It is my first time using model.matrix() and I was not very clear about it.Thank you for clarifying it! It also helped me figure out how to do out-of-sample prediction. I add it here in case someone else needs help with this. oos.df<-data.frame(x=c(0,1), z=factor(c(1,2),levels=1:2, labels=c("short","tall"))) p4<-predict(m, oos.df, "probs") To follow up on your comment how design matrix can be used for multiplying the coefficients and not for looking up variables, I tried to do just that, but I am getting different results from those returned by predict(). For example, head(predict(m, df, "probs")) returns good bad ugly 1 9.999256e-01 3.419782e-05 4.018257e-05 2 1.064469e-05 5.163435e-01 4.836458e-01 3 8.623258e-06 4.523593e-01 5.476321e-01 4 9.999418e-01 3.116868e-05 2.702482e-05 5 1.289231e-05 5.789110e-01 4.210761e-01 6 9.213303e-06 4.718692e-01 5.281215e-01 To do the same with matrix multiplication, I use the expression 1/(1+exp(-xb)): head(1/(1+exp(-(cbind(c(1), mm) %*% coef(m)[2,])))) which should produce the third column above, but it doesn't: [,1] 1 4.018394e-05 2 9.999780e-01 3 9.999843e-01 4 2.702566e-05 5 9.999694e-01 6 9.999826e-01 On Fri, Nov 20, 2015 at 4:57 AM, peter dalgaard <pdalgd at gmail.com> wrote:> > On 20 Nov 2015, at 10:07 , peter dalgaard <pdalgd at gmail.com> wrote: > > >> > >> On 20 Nov 2015, at 04:53 , Damir Cosic <damir.cosic at gmail.com> wrote: > >> > >> Hello, > >> > >> I am having problems with predict() after a multinomial logit > regression by > >> multinom(). I generate a design matrix with model.matrix() and use it to > >> estimate the model. Then, if I pass the entire design matrix to > predict(), > >> it returns the same output as fitted(), which is expected. But if I pass > >> only a few rows of the design matrix, it throws this error: > >> > >> Error in model.frame.default(Terms, newdata, na.action = na.omit, xlev > >> = object$xlevels) : variable lengths differ (found for 'z') In > addition: > >> > >> Warning message: 'newdata' had 6 rows but variables found have 15 rows > > > > Offhand (sorry, no time for testing things this morning) I suspect that > you are mixing paradigms. You can _either_ multiply coefficients with a > design matrix _or_ look up variables in a data frame, and I think you are > trying to look up variables in a matrix. In particular, I don't expect mm > to have a column called "z". > > A little further thought later: The crux is that matrices will double as > data frames in the newdata argument, but it will only work for numerical > variables. Your model contains a numeric and a factor variable so it won't > work, for two reasons: > > > head(mm) > x ztall > 1 -2.4963581 0 > 2 0.9895450 1 > 3 1.8755237 1 > 4 0.8911458 0 > 5 -2.1458457 1 > 6 0.6294571 1 > > I.e., there is no "z" column, but even if there were, it would mismatch > with the model > > > colnames(mm)[2] <- "z" > > p2<-predict(m,head(mm),"probs") > Error: variable 'z' was fitted with type "factor" but type "numeric" was > supplied > In addition: Warning message: > In model.frame.default(Terms, newdata, na.action = na.omit, xlev > object$xlevels) : > variable 'z' is not a factor > > At this point, newdata=mm will fail too, as I predicted. (Pun almost > unintended.) > > Notice that this works fine: > > > p2<-predict(m,head(df),"probs") > > p2 > good bad ugly > 1 9.998923e-01 8.171733e-05 2.598999e-05 > 2 2.804424e-05 4.170214e-01 5.829506e-01 > 3 2.892377e-05 3.255878e-01 6.743832e-01 > 4 9.999315e-01 2.818836e-05 4.031584e-05 > 5 1.863116e-05 7.420142e-01 2.579672e-01 > 6 2.740359e-05 4.563101e-01 5.436625e-01 > > > > -pd > > > > > Accordingly, neither of your examples actually work, both cases find z > (and x?) in the global environment, it is just only in the latter example > that the inconsistency is discovered. I think you want either to use > model.frame or an explicit mm %*% coef(model) (or thereabouts). > > > > > >> > >> This is a minimal example: > >> > >> require(nnet) > >> > >> y<-factor(rep(c(1,2,3),5), levels=1:3, labels=c("good","bad","ugly")) > >> x<-rnorm(15)+.2*rep(1:3,5) > >> z<-factor(rep(c(1,2,2),5), levels=1:2, labels=c("short","tall")) > >> > >> df<-data.frame(y=y, x=x, z=z) > >> mm<-model.matrix(~x+z, data=df)[,2:3] > >> m<-multinom(y ~ x+z, data=df) > >> > >> p1<-predict(m,mm,"probs") > >> > >> p2<-predict(m,head(mm),"probs") > >> > >> My actual goal is out-of-sample prediction, but I could not make it work > >> and, while debugging it, I reduced it to this problem. > >> > >> Best, > >> > >> Damir > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Peter Dalgaard, Professor, > > Center for Statistics, Copenhagen Business School > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > > Phone: (+45)38153501 > > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > > >[[alternative HTML version deleted]]
peter dalgaard
2015-Nov-20 17:23 UTC
[R] predict() works with the design matrix but throws error with some rows of that matrix
> On 20 Nov 2015, at 17:17 , Damir Cosic <damir.cosic at gmail.com> wrote: > > > To do the same with matrix multiplication, I use the expression 1/(1+exp(-xb)): > > head(1/(1+exp(-(cbind(c(1), mm) %*% coef(m)[2,])))) > > which should produce the third column above, but it doesn't: >I'm rusty on this, but I suspect that you need e_i/sum(e_i) with e_i = exp(x b_i) (and b_1 == 0 by convention) -pd -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com