Hi,
I want to select a subsample from my data, choosing one record from each
group. I know how to do this with a for.
For example: lets say I have the data:
A=cbind(rnorm(100),runif(100),(rep(c(1,2,3,4,5),20)))
Where the third column is the group variable. Then what I want is to select
5 observations. Each one taken randomly from each group.
INDEX =NULL
i=1
for(index_g in unique(A[,3])){
INDEX [i]=sample(which(A[,3]==index_g),1)
i=i+1
}
SEL=A[INDEX,]
Is there a way to do this without a “for”? in other words is there a way to
“vectorize” this?
Thank you,
Mauricio Romero
Quantil S.A.S.
Bogotá,Colombia
www.quantil.com.co
"It is from the earth that we must find our substance; it is on the earth
that we must find solutions to the problems that promise to destroy all life
here"
[[alternative HTML version deleted]]
Hello,
There are probably many ways to do this, but I think
it's easier if you use a data.frame as your object.
The easy solution for the matrix you provide is escaping
me at the moment.
One solution, using the plyr package:
library(plyr)
A <- data.frame(a = rnorm(100),b = runif(100), c = rep(c(1,2,3,4,5),20))
ddply(A, .(c), function(x) x[sample(1:nrow(x), 1), ])
a b c
1 0.02995847 0.4763819 1
2 0.72035194 0.2948611 2
3 1.34963917 0.2057488 3
4 -1.99427160 0.1147923 4
5 -0.73612703 0.5889539 5
Mauricio Romero wrote:> Hi,
>
>
>
> I want to select a subsample from my data, choosing one record from each
> group. I know how to do this with a for.
>
>
>
> For example: lets say I have the data:
>
> A=cbind(rnorm(100),runif(100),(rep(c(1,2,3,4,5),20)))
>
> Where the third column is the group variable. Then what I want is to select
> 5 observations. Each one taken randomly from each group.
>
>
>
>
>
> INDEX =NULL
>
> i=1
>
> for(index_g in unique(A[,3])){
>
> INDEX [i]=sample(which(A[,3]==index_g),1)
>
> i=i+1
>
> }
>
> SEL=A[INDEX,]
>
>
>
>
>
> Is there a way to do this without a ?for?? in other words is there a way to
> ?vectorize? this?
>
>
>
> Thank you,
>
>
>
>
>
> Mauricio Romero
>
> Quantil S.A.S.
>
> Bogot?,Colombia
>
> www.quantil.com.co
>
>
>
> "It is from the earth that we must find our substance; it is on the
earth
> that we must find solutions to the problems that promise to destroy all
life
> here"
>
>
>
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Henrique Dallazuanna
2010-Oct-13 20:23 UTC
[R] vectorizing: selecting one record per group
Try this: A <- data.frame(V1 = rnorm(100), V2 = runif(100), V3 = rep(c(1,2,3,4,5),20)) do.call(cbind, lapply(aggregate(. ~ V3, A, FUN = sample, size = 5), c)) On Wed, Oct 13, 2010 at 5:01 PM, Mauricio Romero < mauricio.romero@quantil.com.co> wrote:> Hi, > > > > I want to select a subsample from my data, choosing one record from each > group. I know how to do this with a for. > > > > For example: lets say I have the data: > > A=cbind(rnorm(100),runif(100),(rep(c(1,2,3,4,5),20))) > > Where the third column is the group variable. Then what I want is to select > 5 observations. Each one taken randomly from each group. > > > > > > INDEX =NULL > > i=1 > > for(index_g in unique(A[,3])){ > > INDEX [i]=sample(which(A[,3]==index_g),1) > > i=i+1 > > } > > SEL=A[INDEX,] > > > > > > Is there a way to do this without a “for”? in other words is there a way to > “vectorize” this? > > > > Thank you, > > > > > > Mauricio Romero > > Quantil S.A.S. > > Bogotá,Colombia > > www.quantil.com.co > > > > "It is from the earth that we must find our substance; it is on the earth > that we must find solutions to the problems that promise to destroy all > life > here" > > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]