Hi, I want to select a subsample from my data, choosing one record from each group. I know how to do this with a for. For example: lets say I have the data: A=cbind(rnorm(100),runif(100),(rep(c(1,2,3,4,5),20))) Where the third column is the group variable. Then what I want is to select 5 observations. Each one taken randomly from each group. INDEX =NULL i=1 for(index_g in unique(A[,3])){ INDEX [i]=sample(which(A[,3]==index_g),1) i=i+1 } SEL=A[INDEX,] Is there a way to do this without a “for”? in other words is there a way to “vectorize” this? Thank you, Mauricio Romero Quantil S.A.S. Bogotá,Colombia www.quantil.com.co "It is from the earth that we must find our substance; it is on the earth that we must find solutions to the problems that promise to destroy all life here" [[alternative HTML version deleted]]
Hello, There are probably many ways to do this, but I think it's easier if you use a data.frame as your object. The easy solution for the matrix you provide is escaping me at the moment. One solution, using the plyr package: library(plyr) A <- data.frame(a = rnorm(100),b = runif(100), c = rep(c(1,2,3,4,5),20)) ddply(A, .(c), function(x) x[sample(1:nrow(x), 1), ]) a b c 1 0.02995847 0.4763819 1 2 0.72035194 0.2948611 2 3 1.34963917 0.2057488 3 4 -1.99427160 0.1147923 4 5 -0.73612703 0.5889539 5 Mauricio Romero wrote:> Hi, > > > > I want to select a subsample from my data, choosing one record from each > group. I know how to do this with a for. > > > > For example: lets say I have the data: > > A=cbind(rnorm(100),runif(100),(rep(c(1,2,3,4,5),20))) > > Where the third column is the group variable. Then what I want is to select > 5 observations. Each one taken randomly from each group. > > > > > > INDEX =NULL > > i=1 > > for(index_g in unique(A[,3])){ > > INDEX [i]=sample(which(A[,3]==index_g),1) > > i=i+1 > > } > > SEL=A[INDEX,] > > > > > > Is there a way to do this without a ?for?? in other words is there a way to > ?vectorize? this? > > > > Thank you, > > > > > > Mauricio Romero > > Quantil S.A.S. > > Bogot?,Colombia > > www.quantil.com.co > > > > "It is from the earth that we must find our substance; it is on the earth > that we must find solutions to the problems that promise to destroy all life > here" > > > > > [[alternative HTML version deleted]] > > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Henrique Dallazuanna
2010-Oct-13 20:23 UTC
[R] vectorizing: selecting one record per group
Try this: A <- data.frame(V1 = rnorm(100), V2 = runif(100), V3 = rep(c(1,2,3,4,5),20)) do.call(cbind, lapply(aggregate(. ~ V3, A, FUN = sample, size = 5), c)) On Wed, Oct 13, 2010 at 5:01 PM, Mauricio Romero < mauricio.romero@quantil.com.co> wrote:> Hi, > > > > I want to select a subsample from my data, choosing one record from each > group. I know how to do this with a for. > > > > For example: lets say I have the data: > > A=cbind(rnorm(100),runif(100),(rep(c(1,2,3,4,5),20))) > > Where the third column is the group variable. Then what I want is to select > 5 observations. Each one taken randomly from each group. > > > > > > INDEX =NULL > > i=1 > > for(index_g in unique(A[,3])){ > > INDEX [i]=sample(which(A[,3]==index_g),1) > > i=i+1 > > } > > SEL=A[INDEX,] > > > > > > Is there a way to do this without a “for”? in other words is there a way to > “vectorize” this? > > > > Thank you, > > > > > > Mauricio Romero > > Quantil S.A.S. > > Bogotá,Colombia > > www.quantil.com.co > > > > "It is from the earth that we must find our substance; it is on the earth > that we must find solutions to the problems that promise to destroy all > life > here" > > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]