I can think of many brute-force ways to do this outside of R, but was wondering if there was a simple/elegant solution within R instead. I have a table that looks something like the following: Factor1 Factor2 Value A 11/11/2009 5 A 11/12/2009 4 B 11/11/2009 7 B 11/13/2009 8>From that I need to generate all permutations of Factor1 and Factor2 andforce a 0 for any combination that doesn?t exist in the actual data table. By way of example, I?d like the output for above to end up as: Factor1 Factor2 Value A 11/11/2009 5 A 11/12/2009 4 A 11/13/2009 0 B 11/11/2009 7 B 11/12/2009 0 B 11/13/2009 8 Truly appreciate any thoughts. -- View this message in context: http://n4.nabble.com/Correcting-for-missing-data-combinations-tp961301p961301.html Sent from the R help mailing list archive at Nabble.com.
On Fri, 11 Dec 2009, GL wrote:> > I can think of many brute-force ways to do this outside of R, but was > wondering if there was a simple/elegant solution within R instead. > > I have a table that looks something like the following: > > Factor1 Factor2 Value > A 11/11/2009 5 > A 11/12/2009 4 > B 11/11/2009 7 > B 11/13/2009 8 > >> From that I need to generate all permutations of Factor1 and Factor2 and > force a 0 for any combination that doesn?t exist in the actual data table. > By way of example, I?d like the output for above to end up as: > > Factor1 Factor2 Value > A 11/11/2009 5 > A 11/12/2009 4 > A 11/13/2009 0 > B 11/11/2009 7 > B 11/12/2009 0 > B 11/13/2009 8 > > Truly appreciate any thoughts.# copy the 'table' to the clipboard, then:> dat <- read.table("clipboard",header=T) > res <- as.data.frame(xtabs(Value~., dat )) > colnames(res) <- sub("Freq","Value",colnames(res)) > resFactor1 Factor2 Value 1 A 11/11/2009 5 2 B 11/11/2009 7 3 A 11/12/2009 4 4 B 11/12/2009 0 5 A 11/13/2009 0 6 B 11/13/2009 8>HTH, Chuck> > -- > View this message in context: http://n4.nabble.com/Correcting-for-missing-data-combinations-tp961301p961301.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
One approach would be to use expand.grid to generate all combinations and then match against what you have. A short example: #generate data - two factors - 4 levels in factor1, 26 levels in factor2 df <- data.frame(factor1 = sample(LETTERS[1:4], 100, replace=T), factor2 = sample(letters, 100, replace=T), value = runif(100)) #generate possible combinations poss.comb <- expand.grid(factor1 = LETTERS[1:4], factor2 = letters) #find matches present <- paste(poss.comb$factor1, poss.comb$factor2 %in% paste(df$factor1, df$factor2) #find possible combinations not in the data poss.comb[!present, ] #add 0 as value zerodata <- cbind(poss.comb[!present, ], value=0) #and append to data rbind(df, zerodata) In place of letters and LETTERS, you could use unique(Factor1) and unique(Factor2) from your own data in creating the poss.comb list. HTH, Greg On 12/11/09 10:19 AM, GL wrote:> I can think of many brute-force ways to do this outside of R, but was > wondering if there was a simple/elegant solution within R instead. > > I have a table that looks something like the following: > > Factor1 Factor2 Value > A 11/11/2009 5 > A 11/12/2009 4 > B 11/11/2009 7 > B 11/13/2009 8 > > > From that I need to generate all permutations of Factor1 and Factor2 and > force a 0 for any combination that doesn?t exist in the actual data table. > By way of example, I?d like the output for above to end up as: > > Factor1 Factor2 Value > A 11/11/2009 5 > A 11/12/2009 4 > A 11/13/2009 0 > B 11/11/2009 7 > B 11/12/2009 0 > B 11/13/2009 8 > > Truly appreciate any thoughts. > >-- Greg Hirson ghirson at ucdavis.edu Graduate Student Agricultural and Environmental Chemistry 1106 Robert Mondavi Institute North One Shields Avenue Davis, CA 95616
This is nice; the matching could be shortened by using merge: ### quoted from the previous message> #generate data - two factors - 4 levels in factor1, 26 levels in factor2 > df <- data.frame(factor1 = sample(LETTERS[1:4], 100, replace=T), > factor2 = sample(letters, 100, replace=T), value = runif(100)) > > #generate possible combinations > poss.comb <- expand.grid(factor1 = LETTERS[1:4], factor2 = letters) >## to merge the two dataframes: adf <- merge(poss.comb, df, all.x = TRUE) adf$value[is.na(all.df$value)] <- 0 Though you may want to leave these values as "NA". Using expand.grid(factor1 = unique(factor1), factor2 = unique(factor2)) could also help --Gray On Fri, Dec 11, 2009 at 3:42 PM, Greg Hirson <ghirson at ucdavis.edu> wrote:> One approach would be to use expand.grid to generate all combinations and > then match against what you have. > > A short example: > > #generate data - two factors - 4 levels in factor1, 26 levels in factor2 > df <- data.frame(factor1 = sample(LETTERS[1:4], 100, replace=T), > ? ?factor2 = sample(letters, 100, replace=T), value = runif(100)) > > #generate possible combinations > poss.comb <- expand.grid(factor1 = LETTERS[1:4], factor2 = letters) > > #find matches > present <- paste(poss.comb$factor1, poss.comb$factor2 %in% paste(df$factor1, > df$factor2) > > #find possible combinations not in the data > poss.comb[!present, ] > > #add 0 as value > zerodata <- cbind(poss.comb[!present, ], value=0) > > #and append to data > rbind(df, zerodata) > > In place of letters and LETTERS, you could use unique(Factor1) and > unique(Factor2) from your own data in creating the poss.comb list. > > HTH, > > Greg > > > On 12/11/09 10:19 AM, GL wrote: >> >> I can think of many brute-force ways to do this outside of R, but was >> wondering if there was a simple/elegant solution within R instead. >> >> I have a table that looks something like the following: >> >> Factor1 Factor2 ? ? ? ? Value >> A ? ? ? 11/11/2009 ? ? ?5 >> A ? ? ? 11/12/2009 ? ? ?4 >> B ? ? ? 11/11/2009 ? ? ?7 >> B ? ? ? 11/13/2009 ? ? ?8 >> >> > From that I need to generate all permutations of Factor1 and Factor2 and >> force a 0 for any combination that doesn?t exist in the actual data table. >> By way of example, I?d like the output for above to end up as: >> >> ?Factor1 ? ? ? Factor2 ? ? ? ? Value >> A ? ? ? 11/11/2009 ? ? ?5 >> A ? ? ? 11/12/2009 ? ? ?4 >> A ? ? ? 11/13/2009 ? ? ?0 >> B ? ? ? 11/11/2009 ? ? ?7 >> B ? ? ? 11/12/2009 ? ? ?0 >> B ? ? ? 11/13/2009 ? ? ?8 >> >> Truly appreciate any thoughts. >> >> > > -- > Greg Hirson > ghirson at ucdavis.edu > > Graduate Student > Agricultural and Environmental Chemistry > > 1106 Robert Mondavi Institute North > One Shields Avenue > Davis, CA 95616 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Gray Calhoun Assistant Professor of Economics Iowa State University