Hi R-users, I'm trying to find an elegant way to count the number of rows in a dataframe with a unique combination of 2 values in the dataframe. My data is specifically one column with a year, one with a month, and one with a day. I'm trying to count the number of days in each year/month combination. But for simplicity's sake, the following dataset will do: x<-c(1,1,1,1,2,2,2,2,3,3,3,3) y<-c(1,1,2,2,3,3,4,4,5,5,6,6) z<-c(1,2,3,4,5,6,7,8,9,10,11,12) X<-data.frame(x y z) So with dataset X, how would I count the number of z values (3rd column in X) with unique combinations of the first two columns (x and y)? (for instance, in the above example, there are 2 instances per unique combination of the first two columns). I can do this in Matlab and it's easy, but since I'm new to R this is royally stumping me. Thanks, Ryan -- Ryan Utz Postdoctoral research scholar University of California, Santa Barbara (724) 272 7769 [[alternative HTML version deleted]]
Henrique Dallazuanna
2011-Jan-25 19:35 UTC
[R] Counting number of rows with two criteria in dataframe
If you want count: xtabs( ~ x + y, X) or sum: xtabs(z ~ x + y, X) On Tue, Jan 25, 2011 at 5:25 PM, Ryan Utz <utz.ryan@gmail.com> wrote:> Hi R-users, > > I'm trying to find an elegant way to count the number of rows in a > dataframe > with a unique combination of 2 values in the dataframe. My data is > specifically one column with a year, one with a month, and one with a day. > I'm trying to count the number of days in each year/month combination. But > for simplicity's sake, the following dataset will do: > > x<-c(1,1,1,1,2,2,2,2,3,3,3,3) > y<-c(1,1,2,2,3,3,4,4,5,5,6,6) > z<-c(1,2,3,4,5,6,7,8,9,10,11,12) > X<-data.frame(x y z) > > So with dataset X, how would I count the number of z values (3rd column in > X) with unique combinations of the first two columns (x and y)? (for > instance, in the above example, there are 2 instances per unique > combination > of the first two columns). I can do this in Matlab and it's easy, but since > I'm new to R this is royally stumping me. > > Thanks, > Ryan > > -- > Ryan Utz > Postdoctoral research scholar > University of California, Santa Barbara > (724) 272 7769 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Ista Zahn
2011-Jan-25 19:51 UTC
[R] Counting number of rows with two criteria in dataframe
Hi Ryan, One option would be X$a <- paste(X$x, X$y, sep=".") table(X$a) Best, Ista On Tue, Jan 25, 2011 at 2:25 PM, Ryan Utz <utz.ryan at gmail.com> wrote:> Hi R-users, > > I'm trying to find an elegant way to count the number of rows in a dataframe > with a unique combination of 2 values in the dataframe. My data is > specifically one column with a year, one with a month, and one with a day. > I'm trying to count the number of days in each year/month combination. But > for simplicity's sake, the following dataset will do: > > x<-c(1,1,1,1,2,2,2,2,3,3,3,3) > y<-c(1,1,2,2,3,3,4,4,5,5,6,6) > z<-c(1,2,3,4,5,6,7,8,9,10,11,12) > X<-data.frame(x y z) > > So with dataset X, how would I count the number of z values (3rd column in > X) with unique combinations of the first two columns (x and y)? (for > instance, in the above example, there are 2 instances per unique combination > of the first two columns). I can do this in Matlab and it's easy, but since > I'm new to R this is royally stumping me. > > Thanks, > Ryan > > -- > Ryan Utz > Postdoctoral research scholar > University of California, Santa Barbara > (724) 272 7769 > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
David Winsemius
2011-Jan-26 02:49 UTC
[R] Counting number of rows with two criteria in dataframe
On Jan 25, 2011, at 2:25 PM, Ryan Utz wrote:> Hi R-users, > > I'm trying to find an elegant way to count the number of rows in a > dataframe > with a unique combination of 2 values in the dataframe. My data is > specifically one column with a year, one with a month, and one with > a day. > I'm trying to count the number of days in each year/month > combination. But > for simplicity's sake, the following dataset will do: > > x<-c(1,1,1,1,2,2,2,2,3,3,3,3) > y<-c(1,1,2,2,3,3,4,4,5,5,6,6) > z<-c(1,2,3,4,5,6,7,8,9,10,11,12) > X<-data.frame(x y z) > > So with dataset X, how would I count the number of z values (3rd > column in > X) with unique combinations of the first two columns (x and y)? (for > instance, in the above example, there are 2 instances per unique > combination > of the first two columns). I can do this in Matlab and it's easy, > but since > I'm new to R this is royally stumping me.> tapply(X$z, list(X$x, X$y), function(xx) length(unique(xx)) ) 1 2 3 4 5 6 1 2 2 NA NA NA NA 2 NA NA 2 2 NA NA 3 NA NA NA NA 2 2> > Thanks, > Ryan > > -- > Ryan Utz > Postdoctoral research scholar > University of California, Santa Barbara > (724) 272 7769 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Dennis Murphy
2011-Jan-26 05:27 UTC
[R] Counting number of rows with two criteria in dataframe
Hi:
Here are two more candidates, using the plyr and data.table packages:
library(plyr)
ddply(X, .(x, y), function(d) length(unique(d$z)))
x y V1
1 1 1 2
2 1 2 2
3 2 3 2
4 2 4 2
5 3 5 2
6 3 6 2
The function counts the number of unique z values in each sub-data frame
with the same x and y values. The argument d in the anonymous function is a
data frame object.
# data.table version:
library(data.table)
dX <- data.table(X, key = 'x, y')
dX[, list(nz = length(unique(z))), by = 'x, y']
x y nz
[1,] 1 1 2
[2,] 1 2 2
[3,] 2 3 2
[4,] 2 4 2
[5,] 3 5 2
[6,] 3 6 2
The key columns sort the data by x, y combinations and then find nz in each
data subset.
If you intend to do a lot of summarization/data manipulation in R, these
packages are worth learning.
HTH,
Dennis
On Tue, Jan 25, 2011 at 11:25 AM, Ryan Utz <utz.ryan@gmail.com> wrote:
> Hi R-users,
>
> I'm trying to find an elegant way to count the number of rows in a
> dataframe
> with a unique combination of 2 values in the dataframe. My data is
> specifically one column with a year, one with a month, and one with a day.
> I'm trying to count the number of days in each year/month combination.
But
> for simplicity's sake, the following dataset will do:
>
> x<-c(1,1,1,1,2,2,2,2,3,3,3,3)
> y<-c(1,1,2,2,3,3,4,4,5,5,6,6)
> z<-c(1,2,3,4,5,6,7,8,9,10,11,12)
> X<-data.frame(x y z)
>
> So with dataset X, how would I count the number of z values (3rd column in
> X) with unique combinations of the first two columns (x and y)? (for
> instance, in the above example, there are 2 instances per unique
> combination
> of the first two columns). I can do this in Matlab and it's easy, but
since
> I'm new to R this is royally stumping me.
>
> Thanks,
> Ryan
>
> --
> Ryan Utz
> Postdoctoral research scholar
> University of California, Santa Barbara
> (724) 272 7769
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Hadley Wickham
2011-Jan-26 13:40 UTC
[R] Counting number of rows with two criteria in dataframe
On Wed, Jan 26, 2011 at 5:27 AM, Dennis Murphy <djmuser at gmail.com> wrote:> Hi: > > Here are two more candidates, using the plyr and data.table packages: > > library(plyr) > ddply(X, .(x, y), function(d) length(unique(d$z))) > ?x y V1 > 1 1 1 ?2 > 2 1 2 ?2 > 3 2 3 ?2 > 4 2 4 ?2 > 5 3 5 ?2 > 6 3 6 ?2 > > The function counts the number of unique z values in each sub-data frame > with the same x and y values. The argument d in the anonymous function is a > data frame object.Another approach is to use the much faster count function: count(unique(X)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/