Benjamin Dickgiesser
2006-Nov-07 10:18 UTC
[R] Better way to create tables of mean & standard deviations
Hi I'm trying to create tables of means, standard deviations and numbers of observations (i) for each laboratory (ii) for each batch number (iii) for each batch at each laboratory for the attached data. I created these functions: summary.aggregate <- function(y, label, ...) { temp.mean <- aggregate(y, FUN=mean, ...) temp.sd <- aggregate(y, FUN=sd, ...) temp.length <- aggregate(y, FUN=length, ...) txtlabs <-makeLabel(label,length(temp.mean$x)) temp <- data.frame(mean=temp.mean$x,stdev=temp.sd$x,n=temp.length$x,row.names=txtlabs) } makeLabel <- function(label,llength,increaseLag=FALSE) { x <- c() for(cnt in 1:llength) { if(increaseLag == TRUE && mode(cnt/2)) { } x[cnt] <- paste(label,cnt) } x } and can use the following commands to create tables of means etc. print(summary.aggregate(data.ceramic$Y,"Lab",by=list(data.ceramic$Lab))) to create output like this: mean stdev n Lab 1 645.6125 65.94129 60 Lab 2 655.2121 70.64094 60 Lab 3 633.3161 80.48620 60 Lab 4 650.3897 77.59191 60 Lab 5 630.4955 84.98888 60 Lab 6 656.2608 66.16100 60 Lab 7 666.1775 74.39796 60 Lab 8 663.1543 71.10769 60 The purpose of the first function is to calculate the mean, stdev etc. and the second is simply to create a labelling vector e.g c(Lab1, Lab2, ..., Lab 8) This seems rather complex to me for what I am trying to achieve. Is there a better way to do this? Also I am having some trouble getting the labelling right for iii since it should look like: Batch mean stdev n Lab 1 1 686.7179 53.37582 30 Lab 1 2 695.8710 62.08583 30 Lab 2 1 654.5317 94.19746 30 Lab 2 2 702.9095 51.44984 30 Lab 3 1 676.2975 69.13784 30 Lab 3 2 692.1952 57.27212 30 Lab 4 1 700.8995 56.91608 30 Lab 4 2 702.5668 62.36488 30 Lab 5 1 604.5070 50.01621 30 Lab 5 2 614.5532 53.64149 30 Lab 6 1 612.1006 58.09503 30 Lab 6 2 597.8699 62.40710 30 Lab 7 1 584.6934 74.66537 30 Lab 7 2 620.3263 54.34871 30 Lab 8 1 631.4555 74.34480 30 Lab 8 2 623.7419 56.42492 30 Currentley I'm using: temp <- summary.aggregate(data.ceramic$Y,"Lab",by=list(data.ceramic$Lab,data.ceramic$Batch)) batchcnt <- c(1,2) print(data.frame(Batc=batchcnt,temp)) But that produces this output: Batc mean stdev n Lab 1 1 686.7179 53.37582 30 Lab 2 2 695.8710 62.08583 30 Lab 3 1 654.5317 94.19746 30 Lab 4 2 702.9095 51.44984 30 Lab 5 1 676.2975 69.13784 30 Lab 6 2 692.1952 57.27212 30 Lab 7 1 700.8995 56.91608 30 Lab 8 2 702.5668 62.36488 30 Lab 9 1 604.5070 50.01621 30 Lab 10 2 614.5532 53.64149 30 Lab 11 1 612.1006 58.09503 30 Lab 12 2 597.8699 62.40710 30 Lab 13 1 584.6934 74.66537 30 Lab 14 2 620.3263 54.34871 30 Lab 15 1 631.4555 74.34480 30 Lab 16 2 623.7419 56.42492 30 I can only think of rather complex ways to solve the labeling issue... I would appreciate it if someone could point out if there are better/cleaner/easier ways of achieving what I'm trying todo. Benjamin
Chuck Cleland
2006-Nov-07 12:59 UTC
[R] Better way to create tables of mean & standard deviations
Benjamin Dickgiesser wrote:> Hi > > I'm trying to create tables of means, standard deviations and numbers > of observations (i) for > each laboratory (ii) for each batch number (iii) for each batch at > each laboratory for the attached data. > > I created these functions: > summary.aggregate <- function(y, label, ...) > { > temp.mean <- aggregate(y, FUN=mean, ...) > temp.sd <- aggregate(y, FUN=sd, ...) > temp.length <- aggregate(y, FUN=length, ...) > txtlabs <-makeLabel(label,length(temp.mean$x)) > > temp <- > data.frame(mean=temp.mean$x,stdev=temp.sd$x,n=temp.length$x,row.names=txtlabs) > > } > makeLabel <- function(label,llength,increaseLag=FALSE) > { > x <- c() > for(cnt in 1:llength) > { > if(increaseLag == TRUE && mode(cnt/2)) > { > > } > x[cnt] <- paste(label,cnt) > } > x > } > > and can use the following commands to create tables of means etc. > > print(summary.aggregate(data.ceramic$Y,"Lab",by=list(data.ceramic$Lab))) > > to create output like this: > > mean stdev n > Lab 1 645.6125 65.94129 60 > Lab 2 655.2121 70.64094 60 > Lab 3 633.3161 80.48620 60 > Lab 4 650.3897 77.59191 60 > Lab 5 630.4955 84.98888 60 > Lab 6 656.2608 66.16100 60 > Lab 7 666.1775 74.39796 60 > Lab 8 663.1543 71.10769 60 > > > The purpose of the first function is to calculate the mean, stdev etc. > and the second is simply to create a labelling vector e.g c(Lab1, > Lab2, ..., Lab 8) > > > > This seems rather complex to me for what I am trying to achieve. Is > there a better way to do this? > Also I am having some trouble getting the labelling right for iii > since it should look like: > > Batch mean stdev n > Lab 1 1 686.7179 53.37582 30 > Lab 1 2 695.8710 62.08583 30 > Lab 2 1 654.5317 94.19746 30 > Lab 2 2 702.9095 51.44984 30 > Lab 3 1 676.2975 69.13784 30 > Lab 3 2 692.1952 57.27212 30 > Lab 4 1 700.8995 56.91608 30 > Lab 4 2 702.5668 62.36488 30 > Lab 5 1 604.5070 50.01621 30 > Lab 5 2 614.5532 53.64149 30 > Lab 6 1 612.1006 58.09503 30 > Lab 6 2 597.8699 62.40710 30 > Lab 7 1 584.6934 74.66537 30 > Lab 7 2 620.3263 54.34871 30 > Lab 8 1 631.4555 74.34480 30 > Lab 8 2 623.7419 56.42492 30 > > Currentley I'm using: > temp <- > summary.aggregate(data.ceramic$Y,"Lab",by=list(data.ceramic$Lab,data.ceramic$Batch)) > > batchcnt <- c(1,2) > print(data.frame(Batc=batchcnt,temp)) > > But that produces this output: > Batc mean stdev n > Lab 1 1 686.7179 53.37582 30 > Lab 2 2 695.8710 62.08583 30 > Lab 3 1 654.5317 94.19746 30 > Lab 4 2 702.9095 51.44984 30 > Lab 5 1 676.2975 69.13784 30 > Lab 6 2 692.1952 57.27212 30 > Lab 7 1 700.8995 56.91608 30 > Lab 8 2 702.5668 62.36488 30 > Lab 9 1 604.5070 50.01621 30 > Lab 10 2 614.5532 53.64149 30 > Lab 11 1 612.1006 58.09503 30 > Lab 12 2 597.8699 62.40710 30 > Lab 13 1 584.6934 74.66537 30 > Lab 14 2 620.3263 54.34871 30 > Lab 15 1 631.4555 74.34480 30 > Lab 16 2 623.7419 56.42492 30 > > I can only think of rather complex ways to solve the labeling issue... > > I would appreciate it if someone could point out if there are > better/cleaner/easier ways of achieving what I'm trying todo.Does this help? g <- function(y) { s <- apply(y, 2, function(z) { z <- z[!is.na(z)] n <- length(z) if(n==0) c(NA,NA,NA,0) else if(n==1) c(z, NA,NA,1) else { m <- mean(z) s <- sd(z) c(Mean=m, SD=s, N=n) } }) w <- as.vector(s) names(w) <- as.vector(outer(rownames(s), colnames(s), paste, sep='')) w } df <- data.frame(LAB = rep(1:8, each=60), BATCH = rep(c(1,2), 240), Y rnorm(480)) library(Hmisc) with(df, summarize(cbind(Y), llist(LAB, BATCH), FUN = g, stat.name=c("mean", "stdev", "n"))) LAB BATCH mean stdev n 1 1 1 0.13467569 1.0623188 30 2 1 2 0.15204232 1.0464287 30 3 2 1 -0.14470044 0.7881942 30 4 2 2 -0.34641739 0.9997924 30 5 3 1 -0.17915298 0.9720036 30 6 3 2 -0.13942702 0.8166447 30 7 4 1 0.08761900 0.9046908 30 8 4 2 0.27103640 0.7692970 30 9 5 1 0.08017377 1.1537611 30 10 5 2 0.01475674 1.0598336 30 11 6 1 0.29208572 0.8006171 30 12 6 2 0.10239509 1.1632274 30 13 7 1 -0.35550603 1.2016190 30 14 7 2 -0.33692452 1.0458184 30 15 8 1 -0.03779253 1.0385098 30 16 8 2 -0.18652758 1.1768540 30 with(df, summarize(cbind(Y), llist(LAB), FUN = g, stat.name=c("mean", "stdev", "n"))) LAB mean stdev n 1 1 0.14335900 1.0454666 60 2 2 -0.24555892 0.8983465 60 3 3 -0.15929000 0.8902766 60 4 4 0.17932770 0.8377011 60 5 5 0.04746526 1.0988603 60 6 6 0.19724041 0.9946316 60 7 7 -0.34621527 1.1168682 60 8 8 -0.11216005 1.1029466 60 Once you write the summary function g, it's not that complex. See ?summarize in the Hmisc package for more detail. Also, you might take a look at the doBy and reshape packages.> Benjamin > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894