Hi R-users, I'm trying to find an elegant way to count the number of rows in a dataframe with a unique combination of 2 values in the dataframe. My data is specifically one column with a year, one with a month, and one with a day. I'm trying to count the number of days in each year/month combination. But for simplicity's sake, the following dataset will do: x<-c(1,1,1,1,2,2,2,2,3,3,3,3) y<-c(1,1,2,2,3,3,4,4,5,5,6,6) z<-c(1,2,3,4,5,6,7,8,9,10,11,12) X<-data.frame(x y z) So with dataset X, how would I count the number of z values (3rd column in X) with unique combinations of the first two columns (x and y)? (for instance, in the above example, there are 2 instances per unique combination of the first two columns). I can do this in Matlab and it's easy, but since I'm new to R this is royally stumping me. Thanks, Ryan -- Ryan Utz Postdoctoral research scholar University of California, Santa Barbara (724) 272 7769 [[alternative HTML version deleted]]
Henrique Dallazuanna
2011-Jan-25 19:35 UTC
[R] Counting number of rows with two criteria in dataframe
If you want count: xtabs( ~ x + y, X) or sum: xtabs(z ~ x + y, X) On Tue, Jan 25, 2011 at 5:25 PM, Ryan Utz <utz.ryan@gmail.com> wrote:> Hi R-users, > > I'm trying to find an elegant way to count the number of rows in a > dataframe > with a unique combination of 2 values in the dataframe. My data is > specifically one column with a year, one with a month, and one with a day. > I'm trying to count the number of days in each year/month combination. But > for simplicity's sake, the following dataset will do: > > x<-c(1,1,1,1,2,2,2,2,3,3,3,3) > y<-c(1,1,2,2,3,3,4,4,5,5,6,6) > z<-c(1,2,3,4,5,6,7,8,9,10,11,12) > X<-data.frame(x y z) > > So with dataset X, how would I count the number of z values (3rd column in > X) with unique combinations of the first two columns (x and y)? (for > instance, in the above example, there are 2 instances per unique > combination > of the first two columns). I can do this in Matlab and it's easy, but since > I'm new to R this is royally stumping me. > > Thanks, > Ryan > > -- > Ryan Utz > Postdoctoral research scholar > University of California, Santa Barbara > (724) 272 7769 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Ista Zahn
2011-Jan-25 19:51 UTC
[R] Counting number of rows with two criteria in dataframe
Hi Ryan, One option would be X$a <- paste(X$x, X$y, sep=".") table(X$a) Best, Ista On Tue, Jan 25, 2011 at 2:25 PM, Ryan Utz <utz.ryan at gmail.com> wrote:> Hi R-users, > > I'm trying to find an elegant way to count the number of rows in a dataframe > with a unique combination of 2 values in the dataframe. My data is > specifically one column with a year, one with a month, and one with a day. > I'm trying to count the number of days in each year/month combination. But > for simplicity's sake, the following dataset will do: > > x<-c(1,1,1,1,2,2,2,2,3,3,3,3) > y<-c(1,1,2,2,3,3,4,4,5,5,6,6) > z<-c(1,2,3,4,5,6,7,8,9,10,11,12) > X<-data.frame(x y z) > > So with dataset X, how would I count the number of z values (3rd column in > X) with unique combinations of the first two columns (x and y)? (for > instance, in the above example, there are 2 instances per unique combination > of the first two columns). I can do this in Matlab and it's easy, but since > I'm new to R this is royally stumping me. > > Thanks, > Ryan > > -- > Ryan Utz > Postdoctoral research scholar > University of California, Santa Barbara > (724) 272 7769 > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
David Winsemius
2011-Jan-26 02:49 UTC
[R] Counting number of rows with two criteria in dataframe
On Jan 25, 2011, at 2:25 PM, Ryan Utz wrote:> Hi R-users, > > I'm trying to find an elegant way to count the number of rows in a > dataframe > with a unique combination of 2 values in the dataframe. My data is > specifically one column with a year, one with a month, and one with > a day. > I'm trying to count the number of days in each year/month > combination. But > for simplicity's sake, the following dataset will do: > > x<-c(1,1,1,1,2,2,2,2,3,3,3,3) > y<-c(1,1,2,2,3,3,4,4,5,5,6,6) > z<-c(1,2,3,4,5,6,7,8,9,10,11,12) > X<-data.frame(x y z) > > So with dataset X, how would I count the number of z values (3rd > column in > X) with unique combinations of the first two columns (x and y)? (for > instance, in the above example, there are 2 instances per unique > combination > of the first two columns). I can do this in Matlab and it's easy, > but since > I'm new to R this is royally stumping me.> tapply(X$z, list(X$x, X$y), function(xx) length(unique(xx)) ) 1 2 3 4 5 6 1 2 2 NA NA NA NA 2 NA NA 2 2 NA NA 3 NA NA NA NA 2 2> > Thanks, > Ryan > > -- > Ryan Utz > Postdoctoral research scholar > University of California, Santa Barbara > (724) 272 7769 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Dennis Murphy
2011-Jan-26 05:27 UTC
[R] Counting number of rows with two criteria in dataframe
Hi: Here are two more candidates, using the plyr and data.table packages: library(plyr) ddply(X, .(x, y), function(d) length(unique(d$z))) x y V1 1 1 1 2 2 1 2 2 3 2 3 2 4 2 4 2 5 3 5 2 6 3 6 2 The function counts the number of unique z values in each sub-data frame with the same x and y values. The argument d in the anonymous function is a data frame object. # data.table version: library(data.table) dX <- data.table(X, key = 'x, y') dX[, list(nz = length(unique(z))), by = 'x, y'] x y nz [1,] 1 1 2 [2,] 1 2 2 [3,] 2 3 2 [4,] 2 4 2 [5,] 3 5 2 [6,] 3 6 2 The key columns sort the data by x, y combinations and then find nz in each data subset. If you intend to do a lot of summarization/data manipulation in R, these packages are worth learning. HTH, Dennis On Tue, Jan 25, 2011 at 11:25 AM, Ryan Utz <utz.ryan@gmail.com> wrote:> Hi R-users, > > I'm trying to find an elegant way to count the number of rows in a > dataframe > with a unique combination of 2 values in the dataframe. My data is > specifically one column with a year, one with a month, and one with a day. > I'm trying to count the number of days in each year/month combination. But > for simplicity's sake, the following dataset will do: > > x<-c(1,1,1,1,2,2,2,2,3,3,3,3) > y<-c(1,1,2,2,3,3,4,4,5,5,6,6) > z<-c(1,2,3,4,5,6,7,8,9,10,11,12) > X<-data.frame(x y z) > > So with dataset X, how would I count the number of z values (3rd column in > X) with unique combinations of the first two columns (x and y)? (for > instance, in the above example, there are 2 instances per unique > combination > of the first two columns). I can do this in Matlab and it's easy, but since > I'm new to R this is royally stumping me. > > Thanks, > Ryan > > -- > Ryan Utz > Postdoctoral research scholar > University of California, Santa Barbara > (724) 272 7769 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hadley Wickham
2011-Jan-26 13:40 UTC
[R] Counting number of rows with two criteria in dataframe
On Wed, Jan 26, 2011 at 5:27 AM, Dennis Murphy <djmuser at gmail.com> wrote:> Hi: > > Here are two more candidates, using the plyr and data.table packages: > > library(plyr) > ddply(X, .(x, y), function(d) length(unique(d$z))) > ?x y V1 > 1 1 1 ?2 > 2 1 2 ?2 > 3 2 3 ?2 > 4 2 4 ?2 > 5 3 5 ?2 > 6 3 6 ?2 > > The function counts the number of unique z values in each sub-data frame > with the same x and y values. The argument d in the anonymous function is a > data frame object.Another approach is to use the much faster count function: count(unique(X)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/