Hi, I am trying to create a script that will evaluate each column of a data frame, regardless of # columns, using some function and sorting the results by an index vector: #upload data (112 rows x 73 columns) SD <- read.csv("/Users/johnjacob/Desktop/StudentsData_RInput.csv", header=TRUE) #assign index vector ID <- SD[ ,2] #write indexed mean function meanfun <- function(x) { for(i in 3:ncol(x)) { meanSD <- tapply(x[,i], ID, FUN=mean)} return(meanSD) } #apply function to data meanfun(SD) What I get is one set of indexed means: 7605 Andrea Billy ERR006 FJM13 2.111111 1.400000 1.888889 3.692308 3.750000 Gayan Jschaef Whitney 1.300000 2.285714 2.000000 ...and what I would like to generate is a set of indexed means for each column in the data set. Any guidance would be much appreciated! Best, Logan -- View this message in context: http://r.789695.n4.nabble.com/Conditional-Loop-For-Data-Frame-Columns-tp4276821p4276821.html Sent from the R help mailing list archive at Nabble.com.
On Jan 8, 2012, at 4:48 PM, jawbonemurphy wrote:> Hi, > > I am trying to create a script that will evaluate each column of a > data > frame, regardless of # columns, using some function and sorting the > results > by an index vector:?lapply ?"[" ?order> #upload data (112 rows x 73 columns) > SD <- read.csv("/Users/johnjacob/Desktop/StudentsData_RInput.csv", > header=TRUE) > > #assign index vector > ID <- SD[ ,2] > > #write indexed mean function > meanfun <- function(x) { > for(i in 3:ncol(x)) { > meanSD <- tapply(x[,i], ID, FUN=mean)}Aren't you worried about over-writing meanSD? this would appear to leave meanSD with only the result from the last column.> return(meanSD) > } >What are you expecting to get back? 'tapply' will very possibly return a matrix.> #apply function to data > meanfun(SD) > > What I get is one set of indexed means: > > 7605 Andrea Billy ERR006 FJM13 > 2.111111 1.400000 1.888889 3.692308 3.750000 > Gayan Jschaef Whitney > 1.300000 2.285714 2.000000 > > ...and what I would like to generate is a set of indexed meansBy indexed you mean grouped? Perhaps you should be looking at ?aggregate> for each > column in the data set. > Any guidance would be much appreciated!>David Winsemius, MD West Hartford, CT
Hello, I believe that the following solves it: aggregate(SD[, 3:ncol(SD)], by=list(ID), mean) aggregate(SD[, 3:ncol(SD)], by=list(ID), mean, na.rm=TRUE) It's the second you want, it will compute the means for groups that aren't only NA and return NaN for groups with all values NA. Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Conditional-Loop-For-Data-Frame-Columns-tp4276821p4280750.html Sent from the R help mailing list archive at Nabble.com.
P.S. If you want to use your function, revised, it may be a good idea: it's faster #write indexed mean function meanfun <- function(x, inx, na.rm=FALSE) { meanSD <- matrix(0, nrow=length(levels(inx)), ncol=length(3:ncol(x))) for(i in 3:ncol(x)) { meanSD[, i - 2] <- tapply(x[,i], ID, FUN=mean, na.rm=na.rm)} return(meanSD) } # apply function to data meanfun(SD, ID, T) # compare results meanfun(SD, ID, T)[, (nc-9):(nc-3)] aggregate(SD[, 3:nc], by=list(ID), mean, na.rm=TRUE)[, (nc-8):(nc-2)] # now make it bigger for timming SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD) SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD) SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD) SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD) dim(SD) ID <- SD[ ,2] system.time(aggregate(SD[, 3:nc], by=list(ID), mean)) user system elapsed 9.72 0.01 9.75 system.time(meanfun(SD, ID)) user system elapsed 3.21 0.03 3.24 Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Conditional-Loop-For-Data-Frame-Columns-tp4276821p4280873.html Sent from the R help mailing list archive at Nabble.com.