Marius Hofert
2013-Mar-11 12:59 UTC
[R] How to 'extend' a data.frame based on given variable combinations ?
Dear expeRts,
I have a data.frame with certain covariate combinations ('group' and
'year')
and corresponding values:
set.seed(1)
x <- data.frame(group = c(rep("A", 4), rep("B", 3)),
year = c(2001, 2003, 2004, 2005,
2003, 2004, 2005),
value = rexp(7))
My goal is essentially to construct a data.frame which contains all (group,
year)
combinations with corresponding number of values. This can easily be done with
tapply():
as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length))) #
=> 2002 missing
However, the tricky part is now that I would like to have *all* years in between
2001 and 2005.
Although tapply() sees the missing year 2001 for group "B" (since
group "A" has a value there),
tapply() does not 'see' the missing year 2002.
How can such a data.frame be constructed [ideally without using additional R
packages]?
Here is a straightforward way (hopelessly inefficient for the application in
mind):
num <- cbind(expand.grid(group = LETTERS[1:2], year=2001:2005), num=0)
covar <- c("group", "year")
for(i in seq_len(nrow(num)))
num[i,"num"] <- sum(apply(x[,covar], 1, function(z) all(z ==
num[i,covar])))
num
Cheers,
Marius
Marius Hofert
2013-Mar-11 13:54 UTC
[R] How to 'extend' a data.frame based on given variable combinations ?
... okay, I found a solution:
set.seed(1)
x <- data.frame(group = c(rep("A", 4), rep("B", 3)),
year = c(2001, 2003, 2004, 2005,
2003, 2004, 2005),
value = rexp(7))
tply <- as.data.frame(as.table(tapply(x$value, list(x$group, x$year),
FUN=length)),
nm=colnames(x)) # => 2002 missing
names(tply) <- c("group", "year", "num")
grid <- expand.grid(group = LETTERS[1:2], year=2001:2005) # all variable
combinations
tply <- merge(grid, tply, by=c("group", "year"),
all=TRUE) # merge the two data.frames
tply$num[is.na(tply$num)] <- 0
tply
Marius Hofert <> writes:
> Dear expeRts,
>
> I have a data.frame with certain covariate combinations ('group'
and 'year')
> and corresponding values:
>
> set.seed(1)
> x <- data.frame(group = c(rep("A", 4), rep("B", 3)),
> year = c(2001, 2003, 2004, 2005,
> 2003, 2004, 2005),
> value = rexp(7))
>
> My goal is essentially to construct a data.frame which contains all (group,
year)
> combinations with corresponding number of values. This can easily be done
with tapply():
>
> as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length)))
# => 2002 missing
>
> However, the tricky part is now that I would like to have *all* years in
between 2001 and 2005.
> Although tapply() sees the missing year 2001 for group "B" (since
group "A" has a value there),
> tapply() does not 'see' the missing year 2002.
>
> How can such a data.frame be constructed [ideally without using additional
R packages]?
>
> Here is a straightforward way (hopelessly inefficient for the application
in mind):
>
> num <- cbind(expand.grid(group = LETTERS[1:2], year=2001:2005), num=0)
> covar <- c("group", "year")
> for(i in seq_len(nrow(num)))
> num[i,"num"] <- sum(apply(x[,covar], 1, function(z) all(z
== num[i,covar])))
> num
>
> Cheers,
>
> Marius
arun
2013-Mar-11 14:17 UTC
[R] How to 'extend' a data.frame based on given variable combinations ?
HI,
Not sure whether it helps or not.
You could use ?merge()
?dat1<-as.data.frame(as.table(tapply(x$value, list(x$group, x$year),
FUN=length)),stringsAsFactors=FALSE)
dat2<-expand.grid(group=LETTERS[1:2],year=2001:2005)
names(dat1)[1:2]<- names(dat2)
res<-merge(dat1,dat2,by=c("group","year"),all=TRUE)
res[is.na(res)]<-0
A.K.
----- Original Message -----
From: Marius Hofert <marius.hofert at math.ethz.ch>
To: R-help <r-help at r-project.org>
Cc:
Sent: Monday, March 11, 2013 8:59 AM
Subject: [R] How to 'extend' a data.frame based on given variable
combinations ?
Dear expeRts,
I have a data.frame with certain covariate combinations ('group' and
'year')
and corresponding values:
set.seed(1)
x <- data.frame(group = c(rep("A", 4), rep("B", 3)),
? ? ? ? ? ? ? ? year? = c(2001,? ? ? 2003, 2004, 2005,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2003, 2004, 2005),
? ? ? ? ? ? ? ? value = rexp(7))
My goal is essentially to construct a data.frame which contains all (group,
year)
combinations with corresponding number of values. This can easily be done with
tapply():
as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length))) #
=> 2002 missing
However, the tricky part is now that I would like to have *all* years in between
2001 and 2005.
Although tapply() sees the missing year 2001 for group "B" (since
group "A" has a value there),
tapply() does not 'see' the missing year 2002.
How can such a data.frame be constructed [ideally without using additional R
packages]?
Here is a straightforward way (hopelessly inefficient for the application in
mind):
num <- cbind(expand.grid(group = LETTERS[1:2], year=2001:2005), num=0)
covar <- c("group", "year")
for(i in seq_len(nrow(num)))
? ? num[i,"num"] <- sum(apply(x[,covar], 1, function(z) all(z ==
num[i,covar])))
num
Cheers,
Marius
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reasonably Related Threads
- How to apply a function to subsets of a data frame *and* obtain a data frame again?
- How to convert the output of tapply() so that it has the same order as the input?
- bwplot [lattice]: how to get different y-axis scales for each row?
- coxph weirdness
- aggregate(), tapply(): Why is the order of the grouping variables not kept?