Matt Spitzer
2012-Feb-20 03:33 UTC
[R] repeating or looping within an apply statement to handle multiple variables
Dear R experts, I would like to please ask for your help with repeating steps in an apply statement. I have a dataframe that lists multiple variables for a given id and visit, as well as drug treatment.> head(exp)id visit variable1 variable2 variable3 variable4 drug 1 3 1 13 10 7 11 0 2 3 5 10 15 9 9 0 3 3 12 9 10 8 8 0 4 7 1 12 8 9 8 1 5 7 5 16 9 3 10 1 6 7 12 5 11 9 14 1 I would like process these variables to find the difference between visit 5 and 1 for each id, then summarize this data in terms of means and errors. Thus far, with your brilliant advice to employ do.call and lapply, I have been able to process one variable at a time, but I would much prefer to loop or repeat the process for each variable in order to create an efficiently stored set of data. I would like to get a data set such as:> exp1id variable drug d5.3 3 3 variable1 0 -3 7 7 variable1 1 4 13 13 variable1 0 -5 56 56 variable1 0 4 78 78 variable1 0 7 109 109 variable1 0 -3 145 145 variable1 0 -2 173 173 variable1 0 9 212 212 variable1 1 -7 3 3 variable2 ? ? 7 7 variable2 ? ? 13 13 variable2 ? ? 56 56 variable2 ? ? 78 78 variable2 ? ? 109 109 variable2 ? ? 145 145 variable2 ? ? 173 173 variable2 ? ? 212 212 variable2 ? ? 3 3 variable3 ? ? etc...> exp2variable difference gel mean sd n se X95ci mean.sd 0 variable1 d5.1 0 1.0 5.567764 7 2.104417 5.149323 0.1796053 1 variable1 d5.1 1 -1.5 7.778175 2 5.500000 69.884126 -0.1928473 se.sd X95ci.sd 0 0.3779645 0.9248457 1 0.7071068 8.9846435 But, I have only been able to get the data for the first variable, despite having attempted loop statements, ie (for i in c('variable1','variable2','variable3','variable4')), for the variable names. Would you please have any thoughts about how to repeat lapply across many column variables? I greatly appreciate your thoughts. I have supplied the code for the example and my work thus far below: exp <- data.frame(id= rep(c(3,7,13,56,78,109,145,173,212),each=3) , visit = rep(c(1,5,12), times = 9 ) , variable1 = round (rnorm ( mean =10,sd = 3, n = 27),0) , variable2 = round (rnorm ( mean =10,sd = 3, n = 27),0) , variable3 = round (rnorm ( mean =10,sd = 3, n = 27),0) , variable4 = round (rnorm ( mean =10,sd = 3, n = 27),0) , drug = rep ( round ( rnorm ( mean = 0.5, sd=0.1, n=9),0),each = 3 ) ) exp [exp[,'visit'] == 1 & exp[,'id']==3 ,]$variable <- NA exp [exp[,'visit'] == 5 & exp[,'id']==56 ,]$variable <- NA exp1 <- do.call (rbind ,lapply (split (exp, exp$id), function (.grp) { data.frame ('id'=.grp$id[1L], 'variable'= 'variable1', 'drug'=.grp$drug[1L ], 'd5-3'= .grp [.grp [['visit']]==5,]$variable1 - .grp[.grp[['visit']]==1 ,]$variable1 ) })) exp2 <- do.call (rbind ,lapply ( split (exp1,exp1$drug), function (.grp) { a<- na.omit(.grp$d5.3) data.frame('variable'='variable1', 'difference'='d5.1', 'gel'=.grp$drug[1L], 'mean'=mean(a), 'sd'=sd(a), 'n'=length(a), 'se'=sd(a)/sqrt(length(a)), '95ci'= qt(0.975, (length(a)-1)) * sd(a)/sqrt(length(a)), 'mean/sd'=mean(a)/sd(a), 'se/sd'=(sd(a)/sqrt(length(a)))/sd(a), '95ci/sd'=(qt(0.975,(length(a)-1))*sd(a)/sqrt(length(a)))/sd(a) )} ) ) Thanks again for your help, Matt [[alternative HTML version deleted]]