I have a factor (with "n" observations and "k" levels), with only "nobs" < n of the observations not missing. I would like to produce a (n x k) model matrix with treatment contrasts for this factor, with rows of NAs placeholding the missing observations. If I use model.matrix() I get back a (nobs x k) matrix. Is there an easy way to get the (n x k) without carrying along a row ID and merging? Thanks. J.R. Lockwood 412-683-2300 x4941 lockwood at rand.org http://www.rand.org/methodology/stat/members/lockwood/
I would re-expand the model matrix by indexing its (nobs) rows with a longer vector (of length n) containing the correspondence. If there is only one term (say "Z") in the formula which contains the problematic NAs, I would do (roughly) ff <- Y ~ Z # following the example in ?model.matrix mat <- model.matrix(ff, model.frame(ff, data))[cumsum(!is.na(Z)), ] mat[is.na(Z), ] <- NA The second line above creates an n x k matrix in which each row where Z has NA simply duplicates the last preceding non-NA row. The third line above blanks out those duplicate rows by filling them with NAs instead. This simple strategy fails if Z[1] is NA. I haven't time to think up a solution for that case, other than permuting rows in the entire data set so that it doesn't happen. HTH - tom blackwell - u michigan medical school - ann arbor - On Mon, 27 Oct 2003, J.R. Lockwood wrote:> I have a factor (with "n" observations and "k" levels), with only > "nobs" < n of the observations not missing. I would like to produce a > (n x k) model matrix with treatment contrasts for this factor, with > rows of NAs placeholding the missing observations. If I use > model.matrix() I get back a (nobs x k) matrix. Is there an easy way > to get the (n x k) without carrying along a row ID and merging? > Thanks. > > J.R. Lockwood > 412-683-2300 x4941 > lockwood at rand.org > http://www.rand.org/methodology/stat/members/lockwood/
Perhaps a much simpler method (just thought of it) would be to set options(na.action="na.pass") before you start. Or use na.action=na.pass() as an argument in the call to model.frame(), since that's where the problem begins. See help("na.omit"), help("model.frame"). - tom blackwell - u michigan medical school - ann arbor - On Mon, 27 Oct 2003, J.R. Lockwood wrote:> I have a factor (with "n" observations and "k" levels), with only > "nobs" < n of the observations not missing. I would like to produce a > (n x k) model matrix with treatment contrasts for this factor, with > rows of NAs placeholding the missing observations. If I use > model.matrix() I get back a (nobs x k) matrix. Is there an easy way > to get the (n x k) without carrying along a row ID and merging? > Thanks. > > J.R. Lockwood > 412-683-2300 x4941 > lockwood at rand.org > http://www.rand.org/methodology/stat/members/lockwood/
Strangely (to me), just passing na.action=na.pass to model.matrix doesn't work:> f <- factor(rep(letters[1:3], 5)) > is.na(f[sample(15, 3)]) <- TRUE > model.matrix(~f, data=model.frame(~f, na.action=na.pass))(Intercept) fb fc 1 1 0 0 2 1 1 0 3 1 0 1 4 1 0 0 5 1 NA NA 6 1 0 1 7 1 0 0 8 1 NA NA 9 1 0 1 10 1 0 0 11 1 1 0 12 1 0 1 13 1 NA NA 14 1 1 0 15 1 0 1 attr(,"assign") [1] 0 1 1 attr(,"contrasts") attr(,"contrasts")$f [1] "contr.treatment"> model.matrix(~f, na.action=na.pass)(Intercept) fb fc 1 1 0 0 2 1 1 0 3 1 0 1 4 1 0 0 6 1 0 1 7 1 0 0 9 1 0 1 10 1 0 0 11 1 1 0 12 1 0 1 14 1 1 0 15 1 0 1 attr(,"assign") [1] 0 1 1 attr(,"contrasts") attr(,"contrasts")$f [1] "contr.treatment" [OK, it's not so strange: na.action is not a documented argument for model.matrix, and the call to model.frame in model.matrix.default does not have ..., but shouldn't it?] Andy> -----Original Message----- > From: Thomas W Blackwell [mailto:tblackw at umich.edu] > Sent: Monday, October 27, 2003 3:08 PM > To: J.R. Lockwood > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] expanding factor with NA > > > Perhaps a much simpler method (just thought of it) would be to set > > options(na.action="na.pass") > > before you start. Or use na.action=na.pass() as an > argument in the call to model.frame(), since that's where > the problem begins. See help("na.omit"), help("model.frame"). > > - tom blackwell - u michigan medical school - ann arbor - > > On Mon, 27 Oct 2003, J.R. Lockwood wrote: > > > I have a factor (with "n" observations and "k" levels), with only > > "nobs" < n of the observations not missing. I would like > to produce a > > (n x k) model matrix with treatment contrasts for this factor, with > > rows of NAs placeholding the missing observations. If I use > > model.matrix() I get back a (nobs x k) matrix. Is there an > easy way > > to get the (n x k) without carrying along a row ID and merging? > > Thanks. > > > > J.R. Lockwood > > 412-683-2300 x4941 > > lockwood at rand.org > > http://www.rand.org/methodology/stat/members/lockwood/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >