Kim Milferstedt
2006-Feb-01 22:12 UTC
[R] extracting and summarizing data from data.frame with table-like output
Hello, I would like to summarize and extract statistics (calculate means, stderr etc) from a data set that comes as a large table. This table needs to be sorted according to certain categories (in the example below "day", "angle", "distance" and "location"). I would like to have the output in a table similar to the original data, but now with the mean (or stderr etc) for all individual measurement of "res.1" at any "location" in one column, in a second colum the means for "res.2" at any "location" etc. 1. How can I get the means that I extract from my data set into a format similar to the initial data and not as a matrix as I get it with tapply()? 2. How can I calculate the means for many variables at once and not just for one as in tapply()? 3. How does R know what data to use with the function "order()" or "tapply()"? Thanks for your help, Kim Here is an example: ## The code below creates an artificial data set "jj" that resembles my real data. res.1 <- c(rbinom(36, size=50, prob=0.6)) res.2 <- c(rbinom(36, size=20, prob=0.4)) day <- rep(rep(1:3,rep(6,3)),2) angle <- rep(1:3, 12) distance <- rep(rep(1:2,rep(3,2)),6) location <- rep(1:2,c(18,18)) jj <- cbind(res.1,res.2,day,angle,distance, location) ## I order the data pp <- order(day, angle, distance) jj[pp,] ## Now I calculate the mean for "res.1" over the variable "location" ss <- tapply(res.1,list(day, angle, distance, location), mean) ss ## The result "ss" are four matrices but I want a table like output, possibly also with the means for res.2. __________________________________________ Kim Milferstedt University of Illinois at Urbana-Champaign Department of Civil and Environmental Engineering 4125 Newmark Civil Engineering Building 205 North Mathews Avenue MC 250 Urbana, IL 61801 USA phone: (001) 217 333-9663 fax: (001) 217 333-6968 email: milferst at uiuc.edu http://cee.uiuc.edu/research/morgenroth/index.asp