thr3ads.net - R help - [R] Using split and sapply to return entire lines [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Todd A. Gibson

2005-Nov-08 20:20 UTC

[R] Using split and sapply to return entire lines

Hello,
I have a data manipulation problem that I can easily resolve by using
perl or python to pre-process the data, but I would prefer to do it
directly in R.

Given, for example:

  month length ratio monthly1 monthly2
1 Jan   23     0.1   9        6
2 Jan   45     0.2   9        6
3 Jan   16     0.3   9        6
4 Feb   14     0.2   1        9
5 Mar   98     0.4   2        2
6 Mar   02     0.6   2        2

(FWIW, monthly1 and monthly2 are unchanged for each month)

I understand how to do aggregations on single fields using split and
sapply, but how can I get entire lines.  For example, For the maximum
of data$length grouped by data$month I would like to get back some
form of:

2 Jan 45 0.2 9 6
4 Feb 14 0.2 1 9
5 Mar 98 0.4 2 2

For mean, I would like to average all columns:

Jan 28 0.2 9 6
Feb 14 0.2 1 9
Mar 50 0.5 2 2

Thank you,
-TAG
Todd A. Gibson

jim holtman

2005-Nov-09 02:51 UTC

head link

[R] Using split and sapply to return entire lines

Not the most forward way, but it works:

y <- lapply(split(seq(x$month), x$month), function(.x){
.max <- which.max(x$length[.x])
x[.x[.max],]
})
do.call('rbind', y)

y <- lapply(split(seq(x$month), x$month), function(.x){
data.frame(month=x$month[.x[1]], length=mean(x$length[.x]),
ratio=mean(x$ratio[.x]), monthly1=mean(x$monthly1[.x]),
monthly2=mean(x$monthly2[.x]))
})

do.call('rbind', y)



 On 11/8/05, Todd A. Gibson <tgibson@augustcouncil.com>
wrote:>
> Hello,
> I have a data manipulation problem that I can easily resolve by using
> perl or python to pre-process the data, but I would prefer to do it
> directly in R.
>
> Given, for example:
>
> month length ratio monthly1 monthly2
> 1 Jan 23 0.1 9 6
> 2 Jan 45 0.2 9 6
> 3 Jan 16 0.3 9 6
> 4 Feb 14 0.2 1 9
> 5 Mar 98 0.4 2 2
> 6 Mar 02 0.6 2 2
>
> (FWIW, monthly1 and monthly2 are unchanged for each month)
>
> I understand how to do aggregations on single fields using split and
> sapply, but how can I get entire lines. For example, For the maximum
> of data$length grouped by data$month I would like to get back some
> form of:
>
> 2 Jan 45 0.2 9 6
> 4 Feb 14 0.2 1 9
> 5 Mar 98 0.4 2 2
>
> For mean, I would like to average all columns:
>
> Jan 28 0.2 9 6
> Feb 14 0.2 1 9
> Mar 50 0.5 2 2
>
> Thank you,
> -TAG
> Todd A. Gibson
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>


--
Jim Holtman
Cincinnati, OH
+1 513 247 0281

What the problem you are trying to solve?

	[[alternative HTML version deleted]]

Gabor Grothendieck

2005-Nov-09 03:08 UTC

head link

[R] Using split and sapply to return entire lines

Using package doBy at http://genetics.agrsci.dk/~sorenh/misc/index.html
try this:

summaryBy(cbind(length, ratio, monthly1, monthly2) ~ month, DF, max)



On 11/8/05, Todd A. Gibson <tgibson at augustcouncil.com>
wrote:> Hello,
> I have a data manipulation problem that I can easily resolve by using
> perl or python to pre-process the data, but I would prefer to do it
> directly in R.
>
> Given, for example:
>
>  month length ratio monthly1 monthly2
> 1 Jan   23     0.1   9        6
> 2 Jan   45     0.2   9        6
> 3 Jan   16     0.3   9        6
> 4 Feb   14     0.2   1        9
> 5 Mar   98     0.4   2        2
> 6 Mar   02     0.6   2        2
>
> (FWIW, monthly1 and monthly2 are unchanged for each month)
>
> I understand how to do aggregations on single fields using split and
> sapply, but how can I get entire lines.  For example, For the maximum
> of data$length grouped by data$month I would like to get back some
> form of:
>
> 2 Jan 45 0.2 9 6
> 4 Feb 14 0.2 1 9
> 5 Mar 98 0.4 2 2
>
> For mean, I would like to average all columns:
>
> Jan 28 0.2 9 6
> Feb 14 0.2 1 9
> Mar 50 0.5 2 2
>
> Thank you,
> -TAG
> Todd A. Gibson
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Gabor Grothendieck

2005-Nov-09 03:47 UTC

head link

[R] Using split and sapply to return entire lines

Also, one can use aggregate:

aggregate(DF[,-1], list(month = DF$month), max)

On 11/8/05, Gabor Grothendieck <ggrothendieck at gmail.com>
wrote:> Using package doBy at http://genetics.agrsci.dk/~sorenh/misc/index.html
> try this:
>
> summaryBy(cbind(length, ratio, monthly1, monthly2) ~ month, DF, max)
>
>
>
> On 11/8/05, Todd A. Gibson <tgibson at augustcouncil.com> wrote:
> > Hello,
> > I have a data manipulation problem that I can easily resolve by using
> > perl or python to pre-process the data, but I would prefer to do it
> > directly in R.
> >
> > Given, for example:
> >
> >  month length ratio monthly1 monthly2
> > 1 Jan   23     0.1   9        6
> > 2 Jan   45     0.2   9        6
> > 3 Jan   16     0.3   9        6
> > 4 Feb   14     0.2   1        9
> > 5 Mar   98     0.4   2        2
> > 6 Mar   02     0.6   2        2
> >
> > (FWIW, monthly1 and monthly2 are unchanged for each month)
> >
> > I understand how to do aggregations on single fields using split and
> > sapply, but how can I get entire lines.  For example, For the maximum
> > of data$length grouped by data$month I would like to get back some
> > form of:
> >
> > 2 Jan 45 0.2 9 6
> > 4 Feb 14 0.2 1 9
> > 5 Mar 98 0.4 2 2
> >
> > For mean, I would like to average all columns:
> >
> > Jan 28 0.2 9 6
> > Feb 14 0.2 1 9
> > Mar 50 0.5 2 2
> >
> > Thank you,
> > -TAG
> > Todd A. Gibson
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> >
>

Todd A. Gibson

2005-Nov-09 05:33 UTC

head link

[R] Using split and sapply to return entire lines

On Tue, Nov 08, 2005 at 10:47:26PM -0500, Gabor Grothendieck
wrote:> Also, one can use aggregate:
> 
> aggregate(DF[,-1], list(month = DF$month), max)
The issue here is that I need the row corresponding to the month with
the maximum length.  That is, aggregate(.) is returning both the
maximum month and maximum ratio for all rows with month=Jan.  I need
the single row in month=Jan which has the maximum length.

Thanks,
-TAG
> On 11/8/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > Using package doBy at
http://genetics.agrsci.dk/~sorenh/misc/index.html
> > try this:
> >
> > summaryBy(cbind(length, ratio, monthly1, monthly2) ~ month, DF, max)
> >
> >
> >
> > On 11/8/05, Todd A. Gibson <tgibson at augustcouncil.com> wrote:
> > > Hello,
> > > I have a data manipulation problem that I can easily resolve by
using
> > > perl or python to pre-process the data, but I would prefer to do
it
> > > directly in R.
> > >
> > > Given, for example:
> > >
> > >  month length ratio monthly1 monthly2
> > > 1 Jan   23     0.1   9        6
> > > 2 Jan   45     0.2   9        6
> > > 3 Jan   16     0.3   9        6
> > > 4 Feb   14     0.2   1        9
> > > 5 Mar   98     0.4   2        2
> > > 6 Mar   02     0.6   2        2
> > >
> > > (FWIW, monthly1 and monthly2 are unchanged for each month)
> > >
> > > I understand how to do aggregations on single fields using split
and
> > > sapply, but how can I get entire lines.  For example, For the
maximum
> > > of data$length grouped by data$month I would like to get back
some
> > > form of:
> > >
> > > 2 Jan 45 0.2 9 6
> > > 4 Feb 14 0.2 1 9
> > > 5 Mar 98 0.4 2 2
> > >
> > > For mean, I would like to average all columns:
> > >
> > > Jan 28 0.2 9 6
> > > Feb 14 0.2 1 9
> > > Mar 50 0.5 2 2
> > >
> > > Thank you,
> > > -TAG
> > > Todd A. Gibson
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> > >
> >

Gabor Grothendieck

2005-Nov-09 05:57 UTC

head link

[R] Using split and sapply to return entire lines

On 11/9/05, Todd A. Gibson <tgibson at augustcouncil.com>
wrote:> On Tue, Nov 08, 2005 at 10:47:26PM -0500, Gabor Grothendieck wrote:
> > Also, one can use aggregate:
> >
> > aggregate(DF[,-1], list(month = DF$month), max)
>
> The issue here is that I need the row corresponding to the month with
> the maximum length.  That is, aggregate(.) is returning both the
> maximum month and maximum ratio for all rows with month=Jan.  I need
> the single row in month=Jan which has the maximum length.
Try this to calculate the index, idx, of the largest in each group
and then put it all together in the last line:

f <- function(x) rownames(x)[which.max(x$length)]
idx <- by(DF, DF$month, f)
cbind(index = c(idx), DF[idx,])

Apparently Analagous Threads

Search for more maybe matching threads

R help - Nov 2005 - Using split and sapply to return entire lines

[R] Using split and sapply to return entire lines

[R] Using split and sapply to return entire lines

[R] Using split and sapply to return entire lines

[R] Using split and sapply to return entire lines

[R] Using split and sapply to return entire lines

[R] Using split and sapply to return entire lines

Apparently Analagous Threads