Claire Jouseau
2008-Apr-13 17:47 UTC
[R] Re peatedly apply multiple functions to subsets of data.
Dear R-users: I have a large dataframe with the following format:>plantsid trt year size num spA spB spZ 1011a 1 1 23.2 3 12 3.2 8 1011a 1 2 17.9 2 10 5.1 2.8 1011a 1 3 12.5 7 12 0 0.5 1011b 2 1 NA NA NA NA NA 1011b 2 2 6 6 4 2 0 1011b 2 3 100.3 5 3 95 2.3 28105a 1 1 9.1 8 0.5 0 8.6 28105a 1 2 16.6 4 2 12 4.6 28105a 1 3 8.7 7 1 0.2 7.5 I am looking for advice on how to select a subset of rows with non-sequential id numbers, apply a series of functions to the subset (excluding rows with missing data), and print the output to a new dataframe containing the output from each unique id. I need to perform the following calculations on each subset of id numbers: 1) for all columns: mean and standard deviation and variance 2) for columns "spA" to "spZ": sum of the covariance matrix and sum of the variance of each column 3) for columns "size" and "year": linear regression of form lm(size~year) Ideally my new dataframes would have the following formats:>plants.calcid trt mean.size sd.size mean.num sd.num sum.spcovar sum.spvar mean.spA sd.spA var.spA 1011a a 17.9 5.4 4.0 2.6 17.12 22.74 11.33 1.15 1.33>plants.lmid intercept se.intercept estimate se.estimate adj.Rsq Tvalue Pvalue N 1011a 28.57 0.06 -5.35 0.03 0.9999 458.09 0.0014 3 I am very new to R and have written the following code from which I can successfully extract the summed covariance values but not anything else because I cannot figure out, if possible, how to extract the relevant columns from a list. Any help you can offer would be greatly appreciated. Thanks, Claire. n <-length(unique(plants$id)) output <-lapply(split(plants,plants$id),head,3) out <-as.array(output) sum.spcovar <-NULL col.mean <-NULL col.sd <-NULL col.var <-NULL sum.spvar <-NULL for(i in 1:n){ spcovar <-function(x) {colSums(var(x))} sum.spcovar[i] <- sum(spcovar(out[[i]])) col.mean[i] <-colMeans(out[[i]]) col.sd[i] <-sd(out[[i]]) col.var[i] <-(sd(out[[i]])^2) sum.spvar[i] <-sum((sd(out[[i]]))^2) } plants.calc <-data.frame(unique(plants$id), rep(1:2,length(uniqueplants$id)), sum.spcovar, sum.spvar, col.mean, col.sd, col.var) -- View this message in context: http://www.nabble.com/Repeatedly-apply-multiple-functions-to-subsets-of-data.-tp16661991p16661991.html Sent from the R help mailing list archive at Nabble.com.