thr3ads.net - R help - [R] mean of subset of rows [Oct 2007]

If this information is useful, please help other people find it:
Share via:

darteta001 at ikasle.ehu.es

2007-Oct-01 15:57 UTC

[R] mean of subset of rows

Dear list, 
this must be an easy one:

I have a data.frame of two columns, "ID" with four different levels (A
to D) and numerical "size", and each of the 4 different IDs is 
repeated a 
different number of times. I would like to get the mean size for each 
ID as another data.frame. I have tried the following:
>ID= as.character(unique(data[,1])) # I use unique() because "data"
will be larger in future>nIDs = length(ID)
>for(i in 1:nIDs){+  subdata = subset(data,V1==ID[i])
+  average = as.data.frame(cbind(1:i,ID[i],mean(subdata[,2]))
+ }

Unfortunately, my output only gets the last level of ID four
times:>average     V1 V2               V3
1  1  D 179.777777777778
2  2  D 179.777777777778
3  3  D 179.777777777778
4  4  D 179.777777777778

How can I get what I need? there might be an easier way to do it, but 
I guess my skills aren?t that good. Any suggestions are welcome

Regards,

David

joris.dewolf at cropdesign.com

2007-Oct-01 16:16 UTC

head link

[R] mean of subset of rows

data <- data.frame(ID = rep(letters[1:4],5),size=rnorm(20,0,1))


aggregate(data$size, by = list(data$ID),mean)










                                                                           
             <darteta001 at ikasl
             e.ehu.es>                                                     
             Sent by:                                                   To 
             r-help-bounces at r-         r-help at r-project.org
             project.org                                                cc 
                                                                           
                                                                   Subject 
             01/10/2007 17:57          [R] mean of subset of rows          
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Dear list,
this must be an easy one:

I have a data.frame of two columns, "ID" with four different levels (A
to D) and numerical "size", and each of the 4 different IDs is
repeated a
different number of times. I would like to get the mean size for each
ID as another data.frame. I have tried the following:
>ID= as.character(unique(data[,1])) # I use unique() because "data"
will be larger in future>nIDs = length(ID)
>for(i in 1:nIDs){+  subdata = subset(data,V1==ID[i])
+  average = as.data.frame(cbind(1:i,ID[i],mean(subdata[,2]))
+ }

Unfortunately, my output only gets the last level of ID four
times:>average     V1 V2               V3
1  1  D 179.777777777778
2  2  D 179.777777777778
3  3  D 179.777777777778
4  4  D 179.777777777778

How can I get what I need? there might be an easier way to do it, but
I guess my skills aren?t that good. Any suggestions are welcome

Regards,

David

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

John Kane

2007-Oct-01 16:42 UTC

head link

[R] mean of subset of rows

--- darteta001 at ikasle.ehu.es wrote:
> Dear list, 
> this must be an easy one:
> 
> I have a data.frame of two columns, "ID" with four
> different levels (A 
> to D) and numerical "size", and each of the 4
> different IDs is 
> repeated a 
> different number of times. I would like to get the
> mean size for each 
> ID as another data.frame. I have tried the
> following:
> 
> >ID= as.character(unique(data[,1])) # I use unique()
> because "data" 
> will be larger in future
> >nIDs = length(ID)
> >for(i in 1:nIDs){
> +  subdata = subset(data,V1==ID[i])
> +  average > as.data.frame(cbind(1:i,ID[i],mean(subdata[,2]))
> + }
> dfnames  <- c("id","v1")

mydata  <- data.frame(id <-as.factor(
c("a","a","b",
"c","c", "b")),
          v1 <- c(2,3,3,2,2,4) )
         names(mydata) <- dfnames
mydata

mysums <-aggregate(mydata[2], id, mean)
 names(mysums)  <- dfnames
 mysums


I am not exactly sure what is happening in that loop
but you have no place to store the results of each
iteration.

This loop should work but you are much better off to
use the aggregate command.  For loops are not liked in
R.   Good luck.

data <- mydata
ID= as.character(unique(data[,1]))
nIDs = length(ID)
average <- matrix(NA, nrow=nIDs, ncol=1)
for(i in 1:nIDs){
  subdata = subset(data,id==ID[i])
  average[i] = mean(subdata[,2])
 }
 
 average
 newdata <- data.frame(ID,average)
 names(newdata) <- dfnames
 newdata
> Unfortunately, my output only gets the last level of
> ID four times:
> >average
>      V1 V2               V3
> 1  1  D 179.777777777778
> 2  2  D 179.777777777778
> 3  3  D 179.777777777778
> 4  4  D 179.777777777778
> 
> How can I get what I need? there might be an easier
> way to do it, but 
> I guess my skills aren?t that good. Any suggestions
> are welcome
> 
> Regards,
> 
> David

Jeffrey Robert Spies

2007-Oct-01 16:42 UTC

head link

[R] mean of subset of rows

You were on the right track with the for loop, but often you can do  
the same thing looplessly (I know, it's not really a word) in R:

If your data is like this:

data<-data.frame(ID=rep(letters[1:4], 5), size=runif(20))

then apply either

tapply(data$size, data$ID, mean)

or

aggregate(data$size, list(data$ID), mean)

For further reference, section 4.2 in "An Introduction to R"  
describes using tapply in this way.

Jeff.

On Oct 1, 2007, at 11:57 AM, <darteta001 at ikasle.ehu.es>  
<darteta001 at ikasle.ehu.es> wrote:
> Dear list,
> this must be an easy one:
>
> I have a data.frame of two columns, "ID" with four different
levels (A
> to D) and numerical "size", and each of the 4 different IDs is
> repeated a
> different number of times. I would like to get the mean size for each
> ID as another data.frame. I have tried the following:
>
>> ID= as.character(unique(data[,1])) # I use unique() because
"data"
> will be larger in future
>> nIDs = length(ID)
>> for(i in 1:nIDs){
> +  subdata = subset(data,V1==ID[i])
> +  average = as.data.frame(cbind(1:i,ID[i],mean(subdata[,2]))
> + }
>
> Unfortunately, my output only gets the last level of ID four times:
>> average
>      V1 V2               V3
> 1  1  D 179.777777777778
> 2  2  D 179.777777777778
> 3  3  D 179.777777777778
> 4  4  D 179.777777777778
>
> How can I get what I need? there might be an easier way to do it, but
> I guess my skills aren?t that good. Any suggestions are welcome
>
> Regards,
>
> David
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Oct 2007 - mean of subset of rows

[R] mean of subset of rows

[R] mean of subset of rows

[R] mean of subset of rows

[R] mean of subset of rows

Reasonably Related Threads