Dear list, this must be an easy one: I have a data.frame of two columns, "ID" with four different levels (A to D) and numerical "size", and each of the 4 different IDs is repeated a different number of times. I would like to get the mean size for each ID as another data.frame. I have tried the following:>ID= as.character(unique(data[,1])) # I use unique() because "data"will be larger in future>nIDs = length(ID) >for(i in 1:nIDs){+ subdata = subset(data,V1==ID[i]) + average = as.data.frame(cbind(1:i,ID[i],mean(subdata[,2])) + } Unfortunately, my output only gets the last level of ID four times:>averageV1 V2 V3 1 1 D 179.777777777778 2 2 D 179.777777777778 3 3 D 179.777777777778 4 4 D 179.777777777778 How can I get what I need? there might be an easier way to do it, but I guess my skills aren?t that good. Any suggestions are welcome Regards, David
data <- data.frame(ID = rep(letters[1:4],5),size=rnorm(20,0,1)) aggregate(data$size, by = list(data$ID),mean) <darteta001 at ikasl e.ehu.es> Sent by: To r-help-bounces at r- r-help at r-project.org project.org cc Subject 01/10/2007 17:57 [R] mean of subset of rows Dear list, this must be an easy one: I have a data.frame of two columns, "ID" with four different levels (A to D) and numerical "size", and each of the 4 different IDs is repeated a different number of times. I would like to get the mean size for each ID as another data.frame. I have tried the following:>ID= as.character(unique(data[,1])) # I use unique() because "data"will be larger in future>nIDs = length(ID) >for(i in 1:nIDs){+ subdata = subset(data,V1==ID[i]) + average = as.data.frame(cbind(1:i,ID[i],mean(subdata[,2])) + } Unfortunately, my output only gets the last level of ID four times:>averageV1 V2 V3 1 1 D 179.777777777778 2 2 D 179.777777777778 3 3 D 179.777777777778 4 4 D 179.777777777778 How can I get what I need? there might be an easier way to do it, but I guess my skills aren?t that good. Any suggestions are welcome Regards, David ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
--- darteta001 at ikasle.ehu.es wrote:> Dear list, > this must be an easy one: > > I have a data.frame of two columns, "ID" with four > different levels (A > to D) and numerical "size", and each of the 4 > different IDs is > repeated a > different number of times. I would like to get the > mean size for each > ID as another data.frame. I have tried the > following: > > >ID= as.character(unique(data[,1])) # I use unique() > because "data" > will be larger in future > >nIDs = length(ID) > >for(i in 1:nIDs){ > + subdata = subset(data,V1==ID[i]) > + average > as.data.frame(cbind(1:i,ID[i],mean(subdata[,2])) > + } >dfnames <- c("id","v1") mydata <- data.frame(id <-as.factor( c("a","a","b", "c","c", "b")), v1 <- c(2,3,3,2,2,4) ) names(mydata) <- dfnames mydata mysums <-aggregate(mydata[2], id, mean) names(mysums) <- dfnames mysums I am not exactly sure what is happening in that loop but you have no place to store the results of each iteration. This loop should work but you are much better off to use the aggregate command. For loops are not liked in R. Good luck. data <- mydata ID= as.character(unique(data[,1])) nIDs = length(ID) average <- matrix(NA, nrow=nIDs, ncol=1) for(i in 1:nIDs){ subdata = subset(data,id==ID[i]) average[i] = mean(subdata[,2]) } average newdata <- data.frame(ID,average) names(newdata) <- dfnames newdata> Unfortunately, my output only gets the last level of > ID four times: > >average > V1 V2 V3 > 1 1 D 179.777777777778 > 2 2 D 179.777777777778 > 3 3 D 179.777777777778 > 4 4 D 179.777777777778 > > How can I get what I need? there might be an easier > way to do it, but > I guess my skills aren?t that good. Any suggestions > are welcome > > Regards, > > David
You were on the right track with the for loop, but often you can do the same thing looplessly (I know, it's not really a word) in R: If your data is like this: data<-data.frame(ID=rep(letters[1:4], 5), size=runif(20)) then apply either tapply(data$size, data$ID, mean) or aggregate(data$size, list(data$ID), mean) For further reference, section 4.2 in "An Introduction to R" describes using tapply in this way. Jeff. On Oct 1, 2007, at 11:57 AM, <darteta001 at ikasle.ehu.es> <darteta001 at ikasle.ehu.es> wrote:> Dear list, > this must be an easy one: > > I have a data.frame of two columns, "ID" with four different levels (A > to D) and numerical "size", and each of the 4 different IDs is > repeated a > different number of times. I would like to get the mean size for each > ID as another data.frame. I have tried the following: > >> ID= as.character(unique(data[,1])) # I use unique() because "data" > will be larger in future >> nIDs = length(ID) >> for(i in 1:nIDs){ > + subdata = subset(data,V1==ID[i]) > + average = as.data.frame(cbind(1:i,ID[i],mean(subdata[,2])) > + } > > Unfortunately, my output only gets the last level of ID four times: >> average > V1 V2 V3 > 1 1 D 179.777777777778 > 2 2 D 179.777777777778 > 3 3 D 179.777777777778 > 4 4 D 179.777777777778 > > How can I get what I need? there might be an easier way to do it, but > I guess my skills aren?t that good. Any suggestions are welcome > > Regards, > > David > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.