Hi,
I am trying to create a script that will evaluate each column of a data
frame, regardless of # columns, using some function and sorting the results
by an index vector:
#upload data (112 rows x 73 columns)
SD <- read.csv("/Users/johnjacob/Desktop/StudentsData_RInput.csv",
header=TRUE)
#assign index vector
ID <- SD[ ,2]
#write indexed mean function
meanfun <- function(x) {
for(i in 3:ncol(x)) {
meanSD <- tapply(x[,i], ID, FUN=mean)}
return(meanSD)
}
#apply function to data
meanfun(SD)
What I get is one set of indexed means:
7605 Andrea Billy ERR006 FJM13
2.111111 1.400000 1.888889 3.692308 3.750000
Gayan Jschaef Whitney
1.300000 2.285714 2.000000
...and what I would like to generate is a set of indexed means for each
column in the data set.
Any guidance would be much appreciated!
Best,
Logan
--
View this message in context:
http://r.789695.n4.nabble.com/Conditional-Loop-For-Data-Frame-Columns-tp4276821p4276821.html
Sent from the R help mailing list archive at Nabble.com.
On Jan 8, 2012, at 4:48 PM, jawbonemurphy wrote:> Hi, > > I am trying to create a script that will evaluate each column of a > data > frame, regardless of # columns, using some function and sorting the > results > by an index vector:?lapply ?"[" ?order> #upload data (112 rows x 73 columns) > SD <- read.csv("/Users/johnjacob/Desktop/StudentsData_RInput.csv", > header=TRUE) > > #assign index vector > ID <- SD[ ,2] > > #write indexed mean function > meanfun <- function(x) { > for(i in 3:ncol(x)) { > meanSD <- tapply(x[,i], ID, FUN=mean)}Aren't you worried about over-writing meanSD? this would appear to leave meanSD with only the result from the last column.> return(meanSD) > } >What are you expecting to get back? 'tapply' will very possibly return a matrix.> #apply function to data > meanfun(SD) > > What I get is one set of indexed means: > > 7605 Andrea Billy ERR006 FJM13 > 2.111111 1.400000 1.888889 3.692308 3.750000 > Gayan Jschaef Whitney > 1.300000 2.285714 2.000000 > > ...and what I would like to generate is a set of indexed meansBy indexed you mean grouped? Perhaps you should be looking at ?aggregate> for each > column in the data set. > Any guidance would be much appreciated!>David Winsemius, MD West Hartford, CT
Hello, I believe that the following solves it: aggregate(SD[, 3:ncol(SD)], by=list(ID), mean) aggregate(SD[, 3:ncol(SD)], by=list(ID), mean, na.rm=TRUE) It's the second you want, it will compute the means for groups that aren't only NA and return NaN for groups with all values NA. Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Conditional-Loop-For-Data-Frame-Columns-tp4276821p4280750.html Sent from the R help mailing list archive at Nabble.com.
P.S.
If you want to use your function, revised, it may be a good idea: it's
faster
#write indexed mean function
meanfun <- function(x, inx, na.rm=FALSE) {
meanSD <- matrix(0, nrow=length(levels(inx)), ncol=length(3:ncol(x)))
for(i in 3:ncol(x)) {
meanSD[, i - 2] <- tapply(x[,i], ID, FUN=mean, na.rm=na.rm)}
return(meanSD)
}
# apply function to data
meanfun(SD, ID, T)
# compare results
meanfun(SD, ID, T)[, (nc-9):(nc-3)]
aggregate(SD[, 3:nc], by=list(ID), mean, na.rm=TRUE)[, (nc-8):(nc-2)]
# now make it bigger for timming
SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD)
SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD)
SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD)
SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD)
dim(SD)
ID <- SD[ ,2]
system.time(aggregate(SD[, 3:nc], by=list(ID), mean))
user system elapsed
9.72 0.01 9.75
system.time(meanfun(SD, ID))
user system elapsed
3.21 0.03 3.24
Rui Barradas
--
View this message in context:
http://r.789695.n4.nabble.com/Conditional-Loop-For-Data-Frame-Columns-tp4276821p4280873.html
Sent from the R help mailing list archive at Nabble.com.