Hello list. I feel like an idiot. There exists a method called expand.grid which, from the documentation, appears to do just what I want, but then it doesn''t, and I can''t get it to behave. Given a dataframe dfr<-data.frame(c1=c("a", "b", NA, "a", "a"), c2=c("d", NA, "d", "e", "e"), c3=c("g", "h", "i", "j", "k")) I would like to have a dataframe with all (unique) combinations of all the factors present. In fact, I would like a simple solution for these two cases: given the three factor columns above, I would like both all _possible_ combinations of the factor levels, and all _present_ combinations of the factor levels (e.g. if I would do this for the first 4 rows of dfr, it would contain no combinations with c3="k"). It would also be nice to be able to choose whether or not NA''s are included. I''m convinced that some package holds a readymade solution, and I''m trying to switch from always writing my own stuff (get the number of levels per column, then use some apply magic) to using what is there, so thanks for any hints, Nick Sabbe -- ping: nick.sabbe@ugent.be link: <http://biomath.ugent.be/> http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove [[alternative HTML version deleted]]
G'day Nick, On Wed, 19 Jan 2011 09:43:56 +0100 "Nick Sabbe" <nick.sabbe at ugent.be> wrote:> Given a dataframe > > dfr<-data.frame(c1=c("a", "b", NA, "a", "a"), c2=c("d", NA, "d", "e", > "e"), c3=c("g", "h", "i", "j", "k")) > > I would like to have a dataframe with all (unique) combinations of > all the factors present.Easy: R> expand.grid(lapply(dfr, levels)) c1 c2 c3 1 a d g 2 b d g 3 a e g 4 b e g 5 a d h 6 b d h 7 a e h 8 b e h 9 a d i 10 b d i 11 a e i 12 b e i 13 a d j 14 b d j 15 a e j 16 b e j 17 a d k 18 b d k 19 a e k 20 b e k> In fact, I would like a simple solution for these two cases: given > the three factor columns above, I would like both all _possible_ > combinations of the factor levels, and all _present_ combinations of > the factor levels (e.g. if I would do this for the first 4 rows of > dfr, it would contain no combinations with c3="k").R> dfrpart <- lapply(dfr[1:4,], factor) R> expand.grid(lapply(dfrpart, levels)) c1 c2 c3 1 a d g 2 b d g 3 a e g 4 b e g 5 a d h 6 b d h 7 a e h 8 b e h 9 a d i 10 b d i 11 a e i 12 b e i 13 a d j 14 b d j 15 a e j 16 b e j> It would also be nice to be able to choose whether or not NA's are > included.R> expand.grid(lapply(dfrpart, function(x) c(levels(x), + if(any(is.na(x))) NA else NULL))) c1 c2 c3 1 a d g 2 b d g 3 <NA> d g 4 a e g 5 b e g 6 <NA> e g 7 a <NA> g 8 b <NA> g 9 <NA> <NA> g 10 a d h 11 b d h .... HTH. Cheers, Berwin ========================== Full address ===========================Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019) +61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009 e-mail: berwin at maths.uwa.edu.au Australia http://www.maths.uwa.edu.au/~berwin