thr3ads.net - R help - [R] index and by groups statement [Aug 2004]

If this information is useful, please help other people find it:
Share via:

Robert Waters

2004-Aug-15 03:32 UTC

[R] index and by groups statement

Dear R-users

Im working with a dataset that contains information
for 8 groups of data and I need to select a sample of
certain size (100 cubic feet by group) from this
database for each of these 8 groups. To clarify, here
is the starting code Im working with:

k<-nrow(dataset)
ix<-sort(runif(k),index.return=TRUE)$ix
M<-max(which(cumsum(dataset$volume[ix])<100))+1 
test<-dataset[ix[1:M],]

However, I don't know how to specify in this code the
instruction: "by groups"

Does anyone have an idea how to do this?

Thanks in advance

RW

Adaikalavan Ramasamy

2004-Aug-15 04:28 UTC

head link

[R] index and by groups statement

If understand you correctly, you have a variable that groups each
observations into one of eight categories. And there several hundred
observations from each category. Now, you want to sample only 100
observations from each category. It this is right, then the following
might help :

   set.seed(123)
   num <- rnorm( length(g) )                    # response variable
   g <- sample( LETTERS[1:8], 1200, replace=T ) # grouping variable
   table(g)
      A   B   C   D   E   F   G   H 
    146 153 131 166 140 164 163 137 


You can either store an list of 100 representative indexes (indexList)
from each category or store the value instead (valueList)

   indexList <- tapply( 1:length(g), g, function(x) sample(x, 100) )  
   valueList <- tapply( num, g, function(x) sample(x, 100) )

The first is easier to double check with
   for(i in 1:8) print(mean(g[ unlist(indexList[[i]]) ] == LETTERS[i]))


If you only want the summary from these 100 sampled values, then you do
not need to store any index or value, but calculate the summary
directly. For example, lets say the median
 
   tapply( num, g, function(x) median( sample(x, 100) ) )


Hope this helps, Adai




On Sun, 2004-08-15 at 04:32, Robert Waters wrote:> Dear R-users
> 
> Im working with a dataset that contains information
> for 8 groups of data and I need to select a sample of
> certain size (100 cubic feet by group) from this
> database for each of these 8 groups. To clarify, here
> is the starting code Im working with:
> 
> k<-nrow(dataset)
> ix<-sort(runif(k),index.return=TRUE)$ix
> M<-max(which(cumsum(dataset$volume[ix])<100))+1 
> test<-dataset[ix[1:M],]
> 
> However, I don't know how to specify in this code the
> instruction: "by groups"
> 
> Does anyone have an idea how to do this?
> 
> Thanks in advance
> 
> RW
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Maybe Matching Threads

Search for more possibly parallel threads

R help - Aug 2004 - index and by groups statement

[R] index and by groups statement

[R] index and by groups statement

Maybe Matching Threads