I have just been bitten by a quirk in the behaviour of model.matrix. I used model.matrix inside a function, and passed to it a formula that was built elsewhere. The formula was of the form ``y ~ x + w + z''. Now, model.matrix cheerfully accepts formulae of this form, although it only ***needs*** the right hand side, i.e. ``~ x + w + z'' --- the ``y'' can be dropped (but in general needn't be). The quirk by which I was bitten was that if the y column of the data frame being used contains missing values, then the corresponding rows are dropped (silently) and the resulting design matrix has rows corresponding only to the non-missing values of y. This was not the desired behaviour in my application. Might I respectfully suggest to R Core that a WARNING be added to the help for model.matrix to the effect that model.matrix(y~x + w + z,XXX) and model.matrix(~x + w + z,XXX) give DIFFERENT results if the column ``y'' of the data frame XXX contains missing values? cheers, Rolf Turner rolf at math.unb.ca
I think you could bit that using (inside your function) something like this old.o <- options(na.action = na.fail) # old.o <- options(na.action = na.pass) on.exit(old.o) I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Rolf Turner" <rolf at math.unb.ca> To: <r-help at stat.math.ethz.ch> Sent: Thursday, June 02, 2005 4:14 PM Subject: [R] Caution on the use of model.matrix.>I have just been bitten by a quirk in the behaviour of model.matrix. > I used model.matrix inside a function, and passed to it a formula > that was built elsewhere. > > The formula was of the form ``y ~ x + w + z''. Now, model.matrix > cheerfully accepts formulae of this form, although it only > ***needs*** the right hand side, i.e. ``~ x + w + z'' --- the ``y'' > can be dropped (but in general needn't be). > > The quirk by which I was bitten was that if the y column of the data > frame being used contains missing values, then the corresponding > rows > are dropped (silently) and the resulting design matrix has rows > corresponding only to the non-missing values of y. This was not the > desired behaviour in my application. > > Might I respectfully suggest to R Core that a WARNING be added to > the > help for model.matrix to the effect that > > model.matrix(y~x + w + z,XXX) > and > model.matrix(~x + w + z,XXX) > > give DIFFERENT results if the column ``y'' of the data frame XXX > contains missing values? > > cheers, > > Rolf Turner > rolf at math.unb.ca > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Brian Ripley wrote:> <snip> But the real problem is more likely that Rolf has not passed > model.matrix a model frame, so it calls model.frame() internally. The > help page is a bit confused in that it says > > data: a data frame created with 'model.frame'. > > which the default for the argument is not. So a better solution would > then be to call model.frame and pass a model frame to model.matrix. > > delete.response() might also be useful. > > The suggested warning only applies if `data' is not supplied.I don't grok this. I ***did*** supply data (in the form of a data frame, not a model frame). My call was of the form X <- model.matrix(fmla,XXX) where (originally) ``fmla'' was a formula with the structure ``y ~ x + w + z'', and XXX was a data frame with columns ``y'', ``x'', ``w'', and ``z''. (The response variable ``y'' had NAs in it, which caused the problem.) The data frame XXX was ``input data''; it was not created with model.frame, but it was data nonetheless. I replaced the forgoing call with X <- model.matrix(fmla[-2],XXX) (the ``-2'' causing the ``y'' part of the formula to be discarded) and got the results I wanted. There may be a better way of achieving my goal, but I'm happy with my method --- unless someone points out lurking hazzards that have so far not been apparent to me. I merely wanted to point out to others the somewhat unintuitive behaviour of model.matrix. cheers, Rolf Turner rolf at math.unb.ca