If understand you correctly, you have a variable that groups each
observations into one of eight categories. And there several hundred
observations from each category. Now, you want to sample only 100
observations from each category. It this is right, then the following
might help :
set.seed(123)
num <- rnorm( length(g) ) # response variable
g <- sample( LETTERS[1:8], 1200, replace=T ) # grouping variable
table(g)
A B C D E F G H
146 153 131 166 140 164 163 137
You can either store an list of 100 representative indexes (indexList)
from each category or store the value instead (valueList)
indexList <- tapply( 1:length(g), g, function(x) sample(x, 100) )
valueList <- tapply( num, g, function(x) sample(x, 100) )
The first is easier to double check with
for(i in 1:8) print(mean(g[ unlist(indexList[[i]]) ] == LETTERS[i]))
If you only want the summary from these 100 sampled values, then you do
not need to store any index or value, but calculate the summary
directly. For example, lets say the median
tapply( num, g, function(x) median( sample(x, 100) ) )
Hope this helps, Adai
On Sun, 2004-08-15 at 04:32, Robert Waters wrote:> Dear R-users
>
> Im working with a dataset that contains information
> for 8 groups of data and I need to select a sample of
> certain size (100 cubic feet by group) from this
> database for each of these 8 groups. To clarify, here
> is the starting code Im working with:
>
> k<-nrow(dataset)
> ix<-sort(runif(k),index.return=TRUE)$ix
> M<-max(which(cumsum(dataset$volume[ix])<100))+1
> test<-dataset[ix[1:M],]
>
> However, I don't know how to specify in this code the
> instruction: "by groups"
>
> Does anyone have an idea how to do this?
>
> Thanks in advance
>
> RW
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>