thr3ads.net - R help - [R] apply mean function to a subset of data [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Pedro Mardones

2016-Apr-02 18:46 UTC

[R] apply mean function to a subset of data

Dear all;

This must have a rather simple answer but haven't been able to figure it
out: I have a data frame with say 2 groups (group 1 & 2). I want to select
from group 1 say "n" rows and calculate the mean; then select
"m" rows from
group 2 and calculate the mean as well. So far I've been using a for loop
for doing it but when it comes to a large data set is rather inefficient.
Any hint to vectorize this would be appreciated.

toy = data.frame(group = c(rep(1,10),rep(2,8)), diam c(rnorm(10),rnorm(8)))
nsel = c(6,4)
smean <- c(0,0)
for (i in 1:2)  smean[i] <- mean(toy$diam[1:nsel[i]])

Thanks

Pedro

	[[alternative HTML version deleted]]

Boris Steipe

2016-Apr-02 19:48 UTC

head link

[R] apply mean function to a subset of data

Your toy code does not reproduce what you describe: mean(toy$diam[1:nsel[i]])
both times selects from elements of group 1. YOu probably want to subset like
toy$diam[toy$group == i]. Also, if there is any real inefficiency here, it is
_not_ because you are executing a for-loop for two iterations. What makes you
think you have an efficiency problem?


B.

On Apr 2, 2016, at 2:46 PM, Pedro Mardones <mardones.p at gmail.com>
wrote:
> Dear all;
> 
> This must have a rather simple answer but haven't been able to figure
it
> out: I have a data frame with say 2 groups (group 1 & 2). I want to
select
> from group 1 say "n" rows and calculate the mean; then select
"m" rows from
> group 2 and calculate the mean as well. So far I've been using a for
loop
> for doing it but when it comes to a large data set is rather inefficient.
> Any hint to vectorize this would be appreciated.
> 
> toy = data.frame(group = c(rep(1,10),rep(2,8)), diam >
c(rnorm(10),rnorm(8)))
> nsel = c(6,4)
> smean <- c(0,0)
> for (i in 1:2)  smean[i] <- mean(toy$diam[1:nsel[i]])
> 
> Thanks
> 
> Pedro
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jim Lemon

2016-Apr-02 23:13 UTC

head link

[R] apply mean function to a subset of data

Hi Pedro,
This may not be much of an improvement, but it was a challenge.

selvec<-as.vector(matrix(c(nsel,unlist(by(toy$diam,toy$group,length))-nsel),
 ncol=2,byrow=TRUE))
TFvec<-rep(c(TRUE,FALSE),length.out=length(selvec))
toynsel<-rep(TFvec,selvec)
by(toy[toynsel,]$diam,toy[toynsel,]$group,mean)

Jim

On 4/3/16, Pedro Mardones <mardones.p at gmail.com>
wrote:> Dear all;
>
> This must have a rather simple answer but haven't been able to figure
it
> out: I have a data frame with say 2 groups (group 1 & 2). I want to
select
> from group 1 say "n" rows and calculate the mean; then select
"m" rows from
> group 2 and calculate the mean as well. So far I've been using a for
loop
> for doing it but when it comes to a large data set is rather inefficient.
> Any hint to vectorize this would be appreciated.
>
> toy = data.frame(group = c(rep(1,10),rep(2,8)), diam >
c(rnorm(10),rnorm(8)))
> nsel = c(6,4)
> smean <- c(0,0)
> for (i in 1:2)  smean[i] <- mean(toy$diam[1:nsel[i]])
>
> Thanks
>
> Pedro
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

David L Carlson

2016-Apr-03 20:44 UTC

head link

[R] apply mean function to a subset of data

Here are several ways to get there, but your original loop is fine once it is
corrected:
> for (i in 1:2)  smean[i] <- mean(toy$diam[toy$group==i][1:nsel[i]])
> smean[1] 0.271489 1.117015

Using sapply() to hide the loop:> smean <- sapply(1:2, function(x)
mean((toy$diam[toy$group==x])[1:nsel[x]]))
> smean[1] 0.271489 1.117015

Or use head()> smean <- sapply(1:2, function(x) mean(head(toy$diam[toy$group==x],
nsel[x])))
> smean[1] 0.271489 1.117015

Or mapply() instead of sapply> smean <- mapply(function(x, y) mean(head(x, y)) , x=split(toy$diam,
toy$group), y=nsel)
> smean       1        2 
0.271489 1.117015

------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jim Lemon
Sent: Saturday, April 2, 2016 6:14 PM
To: Pedro Mardones <mardones.p at gmail.com>
Cc: r-help mailing list <r-help at r-project.org>
Subject: Re: [R] apply mean function to a subset of data

Hi Pedro,
This may not be much of an improvement, but it was a challenge.

selvec<-as.vector(matrix(c(nsel,unlist(by(toy$diam,toy$group,length))-nsel),
 ncol=2,byrow=TRUE))
TFvec<-rep(c(TRUE,FALSE),length.out=length(selvec))
toynsel<-rep(TFvec,selvec)
by(toy[toynsel,]$diam,toy[toynsel,]$group,mean)

Jim

On 4/3/16, Pedro Mardones <mardones.p at gmail.com>
wrote:> Dear all;
>
> This must have a rather simple answer but haven't been able to figure
it
> out: I have a data frame with say 2 groups (group 1 & 2). I want to
select
> from group 1 say "n" rows and calculate the mean; then select
"m" rows from
> group 2 and calculate the mean as well. So far I've been using a for
loop
> for doing it but when it comes to a large data set is rather inefficient.
> Any hint to vectorize this would be appreciated.
>
> toy = data.frame(group = c(rep(1,10),rep(2,8)), diam >
c(rnorm(10),rnorm(8)))
> nsel = c(6,4)
> smean <- c(0,0)
> for (i in 1:2)  smean[i] <- mean(toy$diam[1:nsel[i]])
>
> Thanks
>
> Pedro
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more seemingly similar threads

R help - Apr 2016 - apply mean function to a subset of data

[R] apply mean function to a subset of data

[R] apply mean function to a subset of data

[R] apply mean function to a subset of data

[R] apply mean function to a subset of data

Reasonably Related Threads