Marius Hofert
2013-Mar-11 12:59 UTC
[R] How to 'extend' a data.frame based on given variable combinations ?
Dear expeRts, I have a data.frame with certain covariate combinations ('group' and 'year') and corresponding values: set.seed(1) x <- data.frame(group = c(rep("A", 4), rep("B", 3)), year = c(2001, 2003, 2004, 2005, 2003, 2004, 2005), value = rexp(7)) My goal is essentially to construct a data.frame which contains all (group, year) combinations with corresponding number of values. This can easily be done with tapply(): as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length))) # => 2002 missing However, the tricky part is now that I would like to have *all* years in between 2001 and 2005. Although tapply() sees the missing year 2001 for group "B" (since group "A" has a value there), tapply() does not 'see' the missing year 2002. How can such a data.frame be constructed [ideally without using additional R packages]? Here is a straightforward way (hopelessly inefficient for the application in mind): num <- cbind(expand.grid(group = LETTERS[1:2], year=2001:2005), num=0) covar <- c("group", "year") for(i in seq_len(nrow(num))) num[i,"num"] <- sum(apply(x[,covar], 1, function(z) all(z == num[i,covar]))) num Cheers, Marius
Marius Hofert
2013-Mar-11 13:54 UTC
[R] How to 'extend' a data.frame based on given variable combinations ?
... okay, I found a solution: set.seed(1) x <- data.frame(group = c(rep("A", 4), rep("B", 3)), year = c(2001, 2003, 2004, 2005, 2003, 2004, 2005), value = rexp(7)) tply <- as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length)), nm=colnames(x)) # => 2002 missing names(tply) <- c("group", "year", "num") grid <- expand.grid(group = LETTERS[1:2], year=2001:2005) # all variable combinations tply <- merge(grid, tply, by=c("group", "year"), all=TRUE) # merge the two data.frames tply$num[is.na(tply$num)] <- 0 tply Marius Hofert <> writes:> Dear expeRts, > > I have a data.frame with certain covariate combinations ('group' and 'year') > and corresponding values: > > set.seed(1) > x <- data.frame(group = c(rep("A", 4), rep("B", 3)), > year = c(2001, 2003, 2004, 2005, > 2003, 2004, 2005), > value = rexp(7)) > > My goal is essentially to construct a data.frame which contains all (group, year) > combinations with corresponding number of values. This can easily be done with tapply(): > > as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length))) # => 2002 missing > > However, the tricky part is now that I would like to have *all* years in between 2001 and 2005. > Although tapply() sees the missing year 2001 for group "B" (since group "A" has a value there), > tapply() does not 'see' the missing year 2002. > > How can such a data.frame be constructed [ideally without using additional R packages]? > > Here is a straightforward way (hopelessly inefficient for the application in mind): > > num <- cbind(expand.grid(group = LETTERS[1:2], year=2001:2005), num=0) > covar <- c("group", "year") > for(i in seq_len(nrow(num))) > num[i,"num"] <- sum(apply(x[,covar], 1, function(z) all(z == num[i,covar]))) > num > > Cheers, > > Marius
arun
2013-Mar-11 14:17 UTC
[R] How to 'extend' a data.frame based on given variable combinations ?
HI, Not sure whether it helps or not. You could use ?merge() ?dat1<-as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length)),stringsAsFactors=FALSE) dat2<-expand.grid(group=LETTERS[1:2],year=2001:2005) names(dat1)[1:2]<- names(dat2) res<-merge(dat1,dat2,by=c("group","year"),all=TRUE) res[is.na(res)]<-0 A.K. ----- Original Message ----- From: Marius Hofert <marius.hofert at math.ethz.ch> To: R-help <r-help at r-project.org> Cc: Sent: Monday, March 11, 2013 8:59 AM Subject: [R] How to 'extend' a data.frame based on given variable combinations ? Dear expeRts, I have a data.frame with certain covariate combinations ('group' and 'year') and corresponding values: set.seed(1) x <- data.frame(group = c(rep("A", 4), rep("B", 3)), ? ? ? ? ? ? ? ? year? = c(2001,? ? ? 2003, 2004, 2005, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2003, 2004, 2005), ? ? ? ? ? ? ? ? value = rexp(7)) My goal is essentially to construct a data.frame which contains all (group, year) combinations with corresponding number of values. This can easily be done with tapply(): as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length))) # => 2002 missing However, the tricky part is now that I would like to have *all* years in between 2001 and 2005. Although tapply() sees the missing year 2001 for group "B" (since group "A" has a value there), tapply() does not 'see' the missing year 2002. How can such a data.frame be constructed [ideally without using additional R packages]? Here is a straightforward way (hopelessly inefficient for the application in mind): num <- cbind(expand.grid(group = LETTERS[1:2], year=2001:2005), num=0) covar <- c("group", "year") for(i in seq_len(nrow(num))) ? ? num[i,"num"] <- sum(apply(x[,covar], 1, function(z) all(z == num[i,covar]))) num Cheers, Marius ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Possibly Parallel Threads
- How to apply a function to subsets of a data frame *and* obtain a data frame again?
- How to convert the output of tapply() so that it has the same order as the input?
- bwplot [lattice]: how to get different y-axis scales for each row?
- coxph weirdness
- aggregate(), tapply(): Why is the order of the grouping variables not kept?