thr3ads.net - R help - [R] Conditional Loop For Data Frame Columns [Jan 2012]

If this information is useful, please help other people find it:
Share via:

jawbonemurphy

2012-Jan-08 21:48 UTC

[R] Conditional Loop For Data Frame Columns

Hi,

I am trying to create a script that will evaluate each column of a data
frame, regardless of # columns, using some function and sorting the results
by an index vector:

#upload data (112 rows x 73 columns)
SD <- read.csv("/Users/johnjacob/Desktop/StudentsData_RInput.csv",
header=TRUE)

#assign index vector
ID <- SD[ ,2]

#write indexed mean function
meanfun <- function(x) {
for(i in 3:ncol(x)) {
  meanSD <- tapply(x[,i], ID, FUN=mean)}
return(meanSD)
}

#apply function to data
meanfun(SD)

What I get is one set of indexed means:

7605   Andrea    Billy   ERR006    FJM13 
2.111111 1.400000 1.888889 3.692308 3.750000 
   Gayan  Jschaef  Whitney 
1.300000 2.285714 2.000000 

...and what I would like to generate is a set of indexed means for each
column in the data set.  
Any guidance would be much appreciated!

Best,
Logan




--
View this message in context:
http://r.789695.n4.nabble.com/Conditional-Loop-For-Data-Frame-Columns-tp4276821p4276821.html
Sent from the R help mailing list archive at Nabble.com.

David Winsemius

2012-Jan-09 01:43 UTC

head link

[R] Conditional Loop For Data Frame Columns

On Jan 8, 2012, at 4:48 PM, jawbonemurphy wrote:
> Hi,
>
> I am trying to create a script that will evaluate each column of a  
> data
> frame, regardless of # columns, using some function and sorting the  
> results
> by an index vector:
?lapply
?"["
?order
> #upload data (112 rows x 73 columns)
> SD <-
read.csv("/Users/johnjacob/Desktop/StudentsData_RInput.csv",
> header=TRUE)
>
> #assign index vector
> ID <- SD[ ,2]
>
> #write indexed mean function
> meanfun <- function(x) {
> for(i in 3:ncol(x)) {
>  meanSD <- tapply(x[,i], ID, FUN=mean)}
Aren't you worried about over-writing meanSD? this would appear to  
leave meanSD with only the result from the last column.
> return(meanSD)
> }
>
What are you expecting to get back? 'tapply' will very possibly return  
a matrix.
> #apply function to data
> meanfun(SD)
>
> What I get is one set of indexed means:
>
> 7605   Andrea    Billy   ERR006    FJM13
> 2.111111 1.400000 1.888889 3.692308 3.750000
>   Gayan  Jschaef  Whitney
> 1.300000 2.285714 2.000000
>
> ...and what I would like to generate is a set of indexed means
By indexed you mean grouped? Perhaps you should be looking at ?aggregate
> for each
> column in the data set.
> Any guidance would be much appreciated!
>
David Winsemius, MD
West Hartford, CT

Rui Barradas

2012-Jan-10 01:40 UTC

head link

[R] Conditional Loop For Data Frame Columns

Hello,

I believe that the following solves it:

aggregate(SD[, 3:ncol(SD)], by=list(ID), mean)
aggregate(SD[, 3:ncol(SD)], by=list(ID), mean, na.rm=TRUE)

It's the second you want, it will compute the means for groups that
aren't
only NA
and return NaN for groups with all values NA.

Rui Barradas


--
View this message in context:
http://r.789695.n4.nabble.com/Conditional-Loop-For-Data-Frame-Columns-tp4276821p4280750.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

2012-Jan-10 02:33 UTC

head link

[R] Conditional Loop For Data Frame Columns

P.S.

If you want to use your function, revised, it may be a good idea: it's
faster


#write indexed mean function
meanfun <- function(x, inx, na.rm=FALSE) {
	meanSD <- matrix(0, nrow=length(levels(inx)), ncol=length(3:ncol(x)))
	for(i in 3:ncol(x)) {
		meanSD[, i - 2] <- tapply(x[,i], ID, FUN=mean, na.rm=na.rm)}
	return(meanSD)
}


# apply function to data
meanfun(SD, ID, T)

# compare results
meanfun(SD, ID, T)[, (nc-9):(nc-3)]
aggregate(SD[, 3:nc], by=list(ID), mean, na.rm=TRUE)[, (nc-8):(nc-2)]


# now make it bigger for timming
SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD)
SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD)
SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD)
SD <- rbind(SD, SD, SD, SD, SD, SD, SD, SD)
dim(SD)

ID <- SD[ ,2]
system.time(aggregate(SD[, 3:nc], by=list(ID), mean))
   user  system elapsed 
   9.72    0.01    9.75

system.time(meanfun(SD, ID))
   user  system elapsed 
   3.21    0.03    3.24

Rui Barradas


--
View this message in context:
http://r.789695.n4.nabble.com/Conditional-Loop-For-Data-Frame-Columns-tp4276821p4280873.html
Sent from the R help mailing list archive at Nabble.com.

Seemingly Similar Threads

Search for more maybe matching threads

R help - Jan 2012 - Conditional Loop For Data Frame Columns

[R] Conditional Loop For Data Frame Columns

[R] Conditional Loop For Data Frame Columns

[R] Conditional Loop For Data Frame Columns

[R] Conditional Loop For Data Frame Columns

Seemingly Similar Threads