I have just been bitten by a quirk in the behaviour of model.matrix. I used model.matrix inside a function, and passed to it a formula that was built elsewhere. The formula was of the form ``y ~ x + w + z''. Now, model.matrix cheerfully accepts formulae of this form, although it only ***needs*** the right hand side, i.e. ``~ x + w + z'' --- the ``y'' can be dropped (but in general needn't be). The quirk by which I was bitten was that if the y column of the data frame being used contains missing values, then the corresponding rows are dropped (silently) and the resulting design matrix has rows corresponding only to the non-missing values of y. This was not the desired behaviour in my application. Might I respectfully suggest to R Core that a WARNING be added to the help for model.matrix to the effect that model.matrix(y~x + w + z,XXX) and model.matrix(~x + w + z,XXX) give DIFFERENT results if the column ``y'' of the data frame XXX contains missing values? cheers, Rolf Turner rolf at math.unb.ca
I think you could bit that using (inside your function) something like
this
old.o <- options(na.action = na.fail)
# old.o <- options(na.action = na.pass)
on.exit(old.o)
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
----- Original Message -----
From: "Rolf Turner" <rolf at math.unb.ca>
To: <r-help at stat.math.ethz.ch>
Sent: Thursday, June 02, 2005 4:14 PM
Subject: [R] Caution on the use of model.matrix.
>I have just been bitten by a quirk in the behaviour of model.matrix.
> I used model.matrix inside a function, and passed to it a formula
> that was built elsewhere.
>
> The formula was of the form ``y ~ x + w + z''. Now, model.matrix
> cheerfully accepts formulae of this form, although it only
> ***needs*** the right hand side, i.e. ``~ x + w + z'' --- the
``y''
> can be dropped (but in general needn't be).
>
> The quirk by which I was bitten was that if the y column of the data
> frame being used contains missing values, then the corresponding
> rows
> are dropped (silently) and the resulting design matrix has rows
> corresponding only to the non-missing values of y. This was not the
> desired behaviour in my application.
>
> Might I respectfully suggest to R Core that a WARNING be added to
> the
> help for model.matrix to the effect that
>
> model.matrix(y~x + w + z,XXX)
> and
> model.matrix(~x + w + z,XXX)
>
> give DIFFERENT results if the column ``y'' of the data frame XXX
> contains missing values?
>
> cheers,
>
> Rolf Turner
> rolf at math.unb.ca
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
Brian Ripley wrote:> <snip> But the real problem is more likely that Rolf has not passed > model.matrix a model frame, so it calls model.frame() internally. The > help page is a bit confused in that it says > > data: a data frame created with 'model.frame'. > > which the default for the argument is not. So a better solution would > then be to call model.frame and pass a model frame to model.matrix. > > delete.response() might also be useful. > > The suggested warning only applies if `data' is not supplied.I don't grok this. I ***did*** supply data (in the form of a data frame, not a model frame). My call was of the form X <- model.matrix(fmla,XXX) where (originally) ``fmla'' was a formula with the structure ``y ~ x + w + z'', and XXX was a data frame with columns ``y'', ``x'', ``w'', and ``z''. (The response variable ``y'' had NAs in it, which caused the problem.) The data frame XXX was ``input data''; it was not created with model.frame, but it was data nonetheless. I replaced the forgoing call with X <- model.matrix(fmla[-2],XXX) (the ``-2'' causing the ``y'' part of the formula to be discarded) and got the results I wanted. There may be a better way of achieving my goal, but I'm happy with my method --- unless someone points out lurking hazzards that have so far not been apparent to me. I merely wanted to point out to others the somewhat unintuitive behaviour of model.matrix. cheers, Rolf Turner rolf at math.unb.ca