Dear all; This must have a rather simple answer but haven't been able to figure it out: I have a data frame with say 2 groups (group 1 & 2). I want to select from group 1 say "n" rows and calculate the mean; then select "m" rows from group 2 and calculate the mean as well. So far I've been using a for loop for doing it but when it comes to a large data set is rather inefficient. Any hint to vectorize this would be appreciated. toy = data.frame(group = c(rep(1,10),rep(2,8)), diam c(rnorm(10),rnorm(8))) nsel = c(6,4) smean <- c(0,0) for (i in 1:2) smean[i] <- mean(toy$diam[1:nsel[i]]) Thanks Pedro [[alternative HTML version deleted]]
Your toy code does not reproduce what you describe: mean(toy$diam[1:nsel[i]]) both times selects from elements of group 1. YOu probably want to subset like toy$diam[toy$group == i]. Also, if there is any real inefficiency here, it is _not_ because you are executing a for-loop for two iterations. What makes you think you have an efficiency problem? B. On Apr 2, 2016, at 2:46 PM, Pedro Mardones <mardones.p at gmail.com> wrote:> Dear all; > > This must have a rather simple answer but haven't been able to figure it > out: I have a data frame with say 2 groups (group 1 & 2). I want to select > from group 1 say "n" rows and calculate the mean; then select "m" rows from > group 2 and calculate the mean as well. So far I've been using a for loop > for doing it but when it comes to a large data set is rather inefficient. > Any hint to vectorize this would be appreciated. > > toy = data.frame(group = c(rep(1,10),rep(2,8)), diam > c(rnorm(10),rnorm(8))) > nsel = c(6,4) > smean <- c(0,0) > for (i in 1:2) smean[i] <- mean(toy$diam[1:nsel[i]]) > > Thanks > > Pedro > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Pedro, This may not be much of an improvement, but it was a challenge. selvec<-as.vector(matrix(c(nsel,unlist(by(toy$diam,toy$group,length))-nsel), ncol=2,byrow=TRUE)) TFvec<-rep(c(TRUE,FALSE),length.out=length(selvec)) toynsel<-rep(TFvec,selvec) by(toy[toynsel,]$diam,toy[toynsel,]$group,mean) Jim On 4/3/16, Pedro Mardones <mardones.p at gmail.com> wrote:> Dear all; > > This must have a rather simple answer but haven't been able to figure it > out: I have a data frame with say 2 groups (group 1 & 2). I want to select > from group 1 say "n" rows and calculate the mean; then select "m" rows from > group 2 and calculate the mean as well. So far I've been using a for loop > for doing it but when it comes to a large data set is rather inefficient. > Any hint to vectorize this would be appreciated. > > toy = data.frame(group = c(rep(1,10),rep(2,8)), diam > c(rnorm(10),rnorm(8))) > nsel = c(6,4) > smean <- c(0,0) > for (i in 1:2) smean[i] <- mean(toy$diam[1:nsel[i]]) > > Thanks > > Pedro > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Here are several ways to get there, but your original loop is fine once it is corrected:> for (i in 1:2) smean[i] <- mean(toy$diam[toy$group==i][1:nsel[i]]) > smean[1] 0.271489 1.117015 Using sapply() to hide the loop:> smean <- sapply(1:2, function(x) mean((toy$diam[toy$group==x])[1:nsel[x]])) > smean[1] 0.271489 1.117015 Or use head()> smean <- sapply(1:2, function(x) mean(head(toy$diam[toy$group==x], nsel[x]))) > smean[1] 0.271489 1.117015 Or mapply() instead of sapply> smean <- mapply(function(x, y) mean(head(x, y)) , x=split(toy$diam, toy$group), y=nsel) > smean1 2 0.271489 1.117015 ------------------------------ David L. Carlson Department of Anthropology Texas A&M University -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jim Lemon Sent: Saturday, April 2, 2016 6:14 PM To: Pedro Mardones <mardones.p at gmail.com> Cc: r-help mailing list <r-help at r-project.org> Subject: Re: [R] apply mean function to a subset of data Hi Pedro, This may not be much of an improvement, but it was a challenge. selvec<-as.vector(matrix(c(nsel,unlist(by(toy$diam,toy$group,length))-nsel), ncol=2,byrow=TRUE)) TFvec<-rep(c(TRUE,FALSE),length.out=length(selvec)) toynsel<-rep(TFvec,selvec) by(toy[toynsel,]$diam,toy[toynsel,]$group,mean) Jim On 4/3/16, Pedro Mardones <mardones.p at gmail.com> wrote:> Dear all; > > This must have a rather simple answer but haven't been able to figure it > out: I have a data frame with say 2 groups (group 1 & 2). I want to select > from group 1 say "n" rows and calculate the mean; then select "m" rows from > group 2 and calculate the mean as well. So far I've been using a for loop > for doing it but when it comes to a large data set is rather inefficient. > Any hint to vectorize this would be appreciated. > > toy = data.frame(group = c(rep(1,10),rep(2,8)), diam > c(rnorm(10),rnorm(8))) > nsel = c(6,4) > smean <- c(0,0) > for (i in 1:2) smean[i] <- mean(toy$diam[1:nsel[i]]) > > Thanks > > Pedro > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.