Hello, I have a data manipulation problem that I can easily resolve by using perl or python to pre-process the data, but I would prefer to do it directly in R. Given, for example: month length ratio monthly1 monthly2 1 Jan 23 0.1 9 6 2 Jan 45 0.2 9 6 3 Jan 16 0.3 9 6 4 Feb 14 0.2 1 9 5 Mar 98 0.4 2 2 6 Mar 02 0.6 2 2 (FWIW, monthly1 and monthly2 are unchanged for each month) I understand how to do aggregations on single fields using split and sapply, but how can I get entire lines. For example, For the maximum of data$length grouped by data$month I would like to get back some form of: 2 Jan 45 0.2 9 6 4 Feb 14 0.2 1 9 5 Mar 98 0.4 2 2 For mean, I would like to average all columns: Jan 28 0.2 9 6 Feb 14 0.2 1 9 Mar 50 0.5 2 2 Thank you, -TAG Todd A. Gibson
Not the most forward way, but it works: y <- lapply(split(seq(x$month), x$month), function(.x){ .max <- which.max(x$length[.x]) x[.x[.max],] }) do.call('rbind', y) y <- lapply(split(seq(x$month), x$month), function(.x){ data.frame(month=x$month[.x[1]], length=mean(x$length[.x]), ratio=mean(x$ratio[.x]), monthly1=mean(x$monthly1[.x]), monthly2=mean(x$monthly2[.x])) }) do.call('rbind', y) On 11/8/05, Todd A. Gibson <tgibson@augustcouncil.com> wrote:> > Hello, > I have a data manipulation problem that I can easily resolve by using > perl or python to pre-process the data, but I would prefer to do it > directly in R. > > Given, for example: > > month length ratio monthly1 monthly2 > 1 Jan 23 0.1 9 6 > 2 Jan 45 0.2 9 6 > 3 Jan 16 0.3 9 6 > 4 Feb 14 0.2 1 9 > 5 Mar 98 0.4 2 2 > 6 Mar 02 0.6 2 2 > > (FWIW, monthly1 and monthly2 are unchanged for each month) > > I understand how to do aggregations on single fields using split and > sapply, but how can I get entire lines. For example, For the maximum > of data$length grouped by data$month I would like to get back some > form of: > > 2 Jan 45 0.2 9 6 > 4 Feb 14 0.2 1 9 > 5 Mar 98 0.4 2 2 > > For mean, I would like to average all columns: > > Jan 28 0.2 9 6 > Feb 14 0.2 1 9 > Mar 50 0.5 2 2 > > Thank you, > -TAG > Todd A. Gibson > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? [[alternative HTML version deleted]]
Gabor Grothendieck
2005-Nov-09 03:08 UTC
[R] Using split and sapply to return entire lines
Using package doBy at http://genetics.agrsci.dk/~sorenh/misc/index.html try this: summaryBy(cbind(length, ratio, monthly1, monthly2) ~ month, DF, max) On 11/8/05, Todd A. Gibson <tgibson at augustcouncil.com> wrote:> Hello, > I have a data manipulation problem that I can easily resolve by using > perl or python to pre-process the data, but I would prefer to do it > directly in R. > > Given, for example: > > month length ratio monthly1 monthly2 > 1 Jan 23 0.1 9 6 > 2 Jan 45 0.2 9 6 > 3 Jan 16 0.3 9 6 > 4 Feb 14 0.2 1 9 > 5 Mar 98 0.4 2 2 > 6 Mar 02 0.6 2 2 > > (FWIW, monthly1 and monthly2 are unchanged for each month) > > I understand how to do aggregations on single fields using split and > sapply, but how can I get entire lines. For example, For the maximum > of data$length grouped by data$month I would like to get back some > form of: > > 2 Jan 45 0.2 9 6 > 4 Feb 14 0.2 1 9 > 5 Mar 98 0.4 2 2 > > For mean, I would like to average all columns: > > Jan 28 0.2 9 6 > Feb 14 0.2 1 9 > Mar 50 0.5 2 2 > > Thank you, > -TAG > Todd A. Gibson > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Gabor Grothendieck
2005-Nov-09 03:47 UTC
[R] Using split and sapply to return entire lines
Also, one can use aggregate: aggregate(DF[,-1], list(month = DF$month), max) On 11/8/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> Using package doBy at http://genetics.agrsci.dk/~sorenh/misc/index.html > try this: > > summaryBy(cbind(length, ratio, monthly1, monthly2) ~ month, DF, max) > > > > On 11/8/05, Todd A. Gibson <tgibson at augustcouncil.com> wrote: > > Hello, > > I have a data manipulation problem that I can easily resolve by using > > perl or python to pre-process the data, but I would prefer to do it > > directly in R. > > > > Given, for example: > > > > month length ratio monthly1 monthly2 > > 1 Jan 23 0.1 9 6 > > 2 Jan 45 0.2 9 6 > > 3 Jan 16 0.3 9 6 > > 4 Feb 14 0.2 1 9 > > 5 Mar 98 0.4 2 2 > > 6 Mar 02 0.6 2 2 > > > > (FWIW, monthly1 and monthly2 are unchanged for each month) > > > > I understand how to do aggregations on single fields using split and > > sapply, but how can I get entire lines. For example, For the maximum > > of data$length grouped by data$month I would like to get back some > > form of: > > > > 2 Jan 45 0.2 9 6 > > 4 Feb 14 0.2 1 9 > > 5 Mar 98 0.4 2 2 > > > > For mean, I would like to average all columns: > > > > Jan 28 0.2 9 6 > > Feb 14 0.2 1 9 > > Mar 50 0.5 2 2 > > > > Thank you, > > -TAG > > Todd A. Gibson > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > >
On Tue, Nov 08, 2005 at 10:47:26PM -0500, Gabor Grothendieck wrote:> Also, one can use aggregate: > > aggregate(DF[,-1], list(month = DF$month), max)The issue here is that I need the row corresponding to the month with the maximum length. That is, aggregate(.) is returning both the maximum month and maximum ratio for all rows with month=Jan. I need the single row in month=Jan which has the maximum length. Thanks, -TAG> On 11/8/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote: > > Using package doBy at http://genetics.agrsci.dk/~sorenh/misc/index.html > > try this: > > > > summaryBy(cbind(length, ratio, monthly1, monthly2) ~ month, DF, max) > > > > > > > > On 11/8/05, Todd A. Gibson <tgibson at augustcouncil.com> wrote: > > > Hello, > > > I have a data manipulation problem that I can easily resolve by using > > > perl or python to pre-process the data, but I would prefer to do it > > > directly in R. > > > > > > Given, for example: > > > > > > month length ratio monthly1 monthly2 > > > 1 Jan 23 0.1 9 6 > > > 2 Jan 45 0.2 9 6 > > > 3 Jan 16 0.3 9 6 > > > 4 Feb 14 0.2 1 9 > > > 5 Mar 98 0.4 2 2 > > > 6 Mar 02 0.6 2 2 > > > > > > (FWIW, monthly1 and monthly2 are unchanged for each month) > > > > > > I understand how to do aggregations on single fields using split and > > > sapply, but how can I get entire lines. For example, For the maximum > > > of data$length grouped by data$month I would like to get back some > > > form of: > > > > > > 2 Jan 45 0.2 9 6 > > > 4 Feb 14 0.2 1 9 > > > 5 Mar 98 0.4 2 2 > > > > > > For mean, I would like to average all columns: > > > > > > Jan 28 0.2 9 6 > > > Feb 14 0.2 1 9 > > > Mar 50 0.5 2 2 > > > > > > Thank you, > > > -TAG > > > Todd A. Gibson > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > > > >
Gabor Grothendieck
2005-Nov-09 05:57 UTC
[R] Using split and sapply to return entire lines
On 11/9/05, Todd A. Gibson <tgibson at augustcouncil.com> wrote:> On Tue, Nov 08, 2005 at 10:47:26PM -0500, Gabor Grothendieck wrote: > > Also, one can use aggregate: > > > > aggregate(DF[,-1], list(month = DF$month), max) > > The issue here is that I need the row corresponding to the month with > the maximum length. That is, aggregate(.) is returning both the > maximum month and maximum ratio for all rows with month=Jan. I need > the single row in month=Jan which has the maximum length.Try this to calculate the index, idx, of the largest in each group and then put it all together in the last line: f <- function(x) rownames(x)[which.max(x$length)] idx <- by(DF, DF$month, f) cbind(index = c(idx), DF[idx,])