Josip Dasovic
2008-Aug-27 17:11 UTC
[R] Calculating total observations based on combinations of variable values
Hello: As someone making the move from STATA to R, I'm finding it difficult at times to perform basic tasks in R, so forgive me if I've missed an obvious and easily obtained solution to my problem. I've searched the help guides and the archives and have not been able to find a solution that works. I have a data frame with thousands of observations that looks something like this: YEAR MONTH DAY COUNTRY REGION PROVINCE CITY 1994 1 22 Sri Lanka South Asia Northern (Province) Pungudutivu 1994 1 25 Sri Lanka South Asia Central (Province) Kandy 1994 2 26 Sri Lanka South Asia Central (Province) Kandy 1994 2 28 Sri Lanka South Asia Eastern (Province) Wakianeri 1994 6 28 Sri Lanka South Asia Eastern (Province) Valachenai 1994 6 31 Sri Lanka South Asia Central (Province) Kandy 1995 3 1 Sri Lanka South Asia North (Province) Kilinochchi 1995 3 6 Sri Lanka South Asia Western (Province) Colombo 1995 7 15 Sri Lanka South Asia Northern (Province) Mankulam 1995 7 23 Sri Lanka South Asia Northern (Province) Point Pedro 1995 9 25 Sri Lanka South Asia Northern (Province) Kilali ... What I would like to do is to calculate the total number of observations by unique combinations of the values of (some of the) variables above. For example, I would like to know how many observations (i.e. rows) have the values YEAR==1994 and MONTH==1. In the end, I'd like a table that looks like this: YEAR MONTH #OBS 1994 1 2 1994 2 2 1994 3 0 1994 4 0 1994 5 0 1994 6 2 1994 7 0 1994 8 0 1994 9 0 1994 10 0 1994 11 0 1994 12 0 1995 1 0 1995 2 0 1995 3 2 1995 4 0 ... I do need to fill out the table with all the possible combinations, even where there are no observations with that combination in the data set. At first, it seemed like this would not be think that aggregate is probably the way to go, but there doesn't seem to be an appropriate summary function (FUN) available. Thanks in advance for any help in this matter, Josip
Dylan Beaudette
2008-Aug-27 17:20 UTC
[R] Calculating total observations based on combinations of variable values
On Wednesday 27 August 2008, Josip Dasovic wrote:> Hello: > > As someone making the move from STATA to R, I'm finding it difficult at > times to perform basic tasks in R, so forgive me if I've missed an obvious > and easily obtained solution to my problem. I've searched the help guides > and the archives and have not been able to find a solution that works. > > I have a data frame with thousands of observations that looks something > like this: > > YEAR MONTH DAY COUNTRY REGION PROVINCE > CITY 1994 1 22 Sri Lanka South Asia Northern (Province) > Pungudutivu 1994 1 25 Sri Lanka South Asia Central > (Province) Kandy 1994 2 26 Sri Lanka South Asia > Central (Province) Kandy 1994 2 28 Sri Lanka South > Asia Eastern (Province) Wakianeri 1994 6 28 Sri Lanka > South Asia Eastern (Province) Valachenai 1994 6 31 Sri > Lanka South Asia Central (Province) Kandy 1995 3 > 1 Sri Lanka South Asia North (Province) Kilinochchi > 1995 3 6 Sri Lanka South Asia Western (Province) > Colombo 1995 7 15 Sri Lanka South Asia Northern (Province) > Mankulam 1995 7 23 Sri Lanka South Asia Northern > (Province) Point Pedro 1995 9 25 Sri Lanka South Asia > Northern (Province) Kilali ... > > What I would like to do is to calculate the total number of observations by > unique combinations of the values of (some of the) variables above. > > For example, I would like to know how many observations (i.e. rows) have > the values YEAR==1994 and MONTH==1. > > In the end, I'd like a table that looks like this: > > YEAR MONTH #OBS > 1994 1 2 > 1994 2 2 > 1994 3 0 > 1994 4 0 > 1994 5 0 > 1994 6 2 > 1994 7 0 > 1994 8 0 > 1994 9 0 > 1994 10 0 > 1994 11 0 > 1994 12 0 > 1995 1 0 > 1995 2 0 > 1995 3 2 > 1995 4 0 > ... > > I do need to fill out the table with all the possible combinations, even > where there are no observations with that combination in the data set. At > first, it seemed like this would not be think that aggregate is probably > the way to go, but there doesn't seem to be an appropriate summary function > (FUN) available. Thanks in advance for any help in this matter, > > Josip >?table ?xtabs -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341
Henrique Dallazuanna
2008-Aug-27 17:27 UTC
[R] Calculating total observations based on combinations of variable values
Try this: merge(aggregate(x$DAY, x[, c("YEAR", "MONTH")], length), data.frame(YEAR = unique(x$YEAR), MONTH = 1:12), all = T) On Wed, Aug 27, 2008 at 2:11 PM, Josip Dasovic <jjd9@sfu.ca> wrote:> Hello: > > As someone making the move from STATA to R, I'm finding it difficult at > times to perform basic tasks in R, so forgive me if I've missed an obvious > and easily obtained solution to my problem. I've searched the help guides > and the archives and have not been able to find a solution that works. > > I have a data frame with thousands of observations that looks something > like this: > > YEAR MONTH DAY COUNTRY REGION PROVINCE > CITY > 1994 1 22 Sri Lanka South Asia Northern (Province) > Pungudutivu > 1994 1 25 Sri Lanka South Asia Central (Province) > Kandy > 1994 2 26 Sri Lanka South Asia Central (Province) > Kandy > 1994 2 28 Sri Lanka South Asia Eastern (Province) > Wakianeri > 1994 6 28 Sri Lanka South Asia Eastern (Province) > Valachenai > 1994 6 31 Sri Lanka South Asia Central (Province) > Kandy > 1995 3 1 Sri Lanka South Asia North (Province) > Kilinochchi > 1995 3 6 Sri Lanka South Asia Western (Province) > Colombo > 1995 7 15 Sri Lanka South Asia Northern (Province) > Mankulam > 1995 7 23 Sri Lanka South Asia Northern (Province) > Point Pedro > 1995 9 25 Sri Lanka South Asia Northern (Province) > Kilali > ... > > What I would like to do is to calculate the total number of observations by > unique combinations of the values of (some of the) variables above. > > For example, I would like to know how many observations (i.e. rows) have > the values YEAR==1994 and MONTH==1. > > In the end, I'd like a table that looks like this: > > YEAR MONTH #OBS > 1994 1 2 > 1994 2 2 > 1994 3 0 > 1994 4 0 > 1994 5 0 > 1994 6 2 > 1994 7 0 > 1994 8 0 > 1994 9 0 > 1994 10 0 > 1994 11 0 > 1994 12 0 > 1995 1 0 > 1995 2 0 > 1995 3 2 > 1995 4 0 > ... > > I do need to fill out the table with all the possible combinations, even > where there are no observations with that combination in the data set. > At first, it seemed like this would not be think that aggregate is > probably the way to go, but there doesn't seem to be an appropriate summary > function (FUN) available. Thanks in advance for any help in this matter, > > Josip > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
hadley wickham
2008-Aug-27 19:43 UTC
[R] Calculating total observations based on combinations of variable values
On Wed, Aug 27, 2008 at 12:11 PM, Josip Dasovic <jjd9 at sfu.ca> wrote:> Hello: > > As someone making the move from STATA to R, I'm finding it difficult at times to perform basic tasks in R, so forgive me if I've missed an obvious and easily obtained solution to my problem. I've searched the help guides and the archives and have not been able to find a solution that works. > > I have a data frame with thousands of observations that looks something like this: > > YEAR MONTH DAY COUNTRY REGION PROVINCE CITY > 1994 1 22 Sri Lanka South Asia Northern (Province) Pungudutivu > 1994 1 25 Sri Lanka South Asia Central (Province) Kandy > 1994 2 26 Sri Lanka South Asia Central (Province) Kandy > 1994 2 28 Sri Lanka South Asia Eastern (Province) Wakianeri > 1994 6 28 Sri Lanka South Asia Eastern (Province) Valachenai > 1994 6 31 Sri Lanka South Asia Central (Province) Kandy > 1995 3 1 Sri Lanka South Asia North (Province) Kilinochchi > 1995 3 6 Sri Lanka South Asia Western (Province) Colombo > 1995 7 15 Sri Lanka South Asia Northern (Province) Mankulam > 1995 7 23 Sri Lanka South Asia Northern (Province) Point Pedro > 1995 9 25 Sri Lanka South Asia Northern (Province) Kilali > ... > > What I would like to do is to calculate the total number of observations by unique combinations of the values of (some of the) variables above. > > For example, I would like to know how many observations (i.e. rows) have the values YEAR==1994 and MONTH==1. > > In the end, I'd like a table that looks like this: > > YEAR MONTH #OBS > 1994 1 2 > 1994 2 2 > 1994 3 0 > 1994 4 0 > 1994 5 0 > 1994 6 2 > 1994 7 0 > 1994 8 0 > 1994 9 0 > 1994 10 0 > 1994 11 0 > 1994 12 0 > 1995 1 0 > 1995 2 0 > 1995 3 2 > 1995 4 0 > ... > > I do need to fill out the table with all the possible combinations, even where there are no observations with that combination in the data set. > At first, it seemed like this would not be think that aggregate is probably the way to go, but there doesn't seem to be an appropriate summary function (FUN) available. Thanks in advance for any help in this matter,For this, and other related problems, you might want to look at the reshape package - http://had.co.nz/reshape Hadley -- http://had.co.nz/