Alison Macalady
2010-Aug-21 14:15 UTC
[R] t.tests on a data.frame using an apply-type function
I have a data.frame with ~250 observations (rows) in each of ~50 categories (columns). I would like to perform t.tests on subsets of observations within each column, with the subsets according to index vectors contained in other columns of the data.frame. My data.frame looks something like this: x<-data.frame(matrix(rnorm(200,mean=5,sd=.5),nrow=20)) colnames(x)<-c("site", "status", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8") x$site<-as.factor(rep(c("A", "A", "B", "B", "C"), 4)) x$status<-as.factor(rep(c("D", "L"), 10)) I want to do t.tests on the numeric observations within the data.frame by "site" and by "status": t.test(x[x$site == "A" & x$status =="D",]$X1, x[x$site == "A" & x $status =="L",]$X1) t.test(x[x$site == "B" & x$status =="D",]$X1, x[x$site == "B" & x $status =="L",]$X1) t.test(x[x$site == "C" & x$status =="D",]$X1, x[x$site == "C" & x $status =="L",]$X1) t.test(x[x$site == "A" & x$status =="D",]$X2, x[x$site == "A" & x $status =="L",]$X2) t.test(x[x$site == "B" & x$status =="D",]$X2, x[x$site == "B" & x $status =="L",]$X2) t.test(x[x$site == "C" & x$status =="D",]$X2, x[x$site == "C" & x $status =="L",]$X2) etc... I know I must be able to do this more efficently using a loop and one of the apply functions, e.g. something like this: k=length(levels(x$site)) for (i in 1:k) { site<-levels(x$site)[i] x1<-x[x$site == site, ] results[i]<-apply(x1, 2, function(x1) {t.test(x1[x1$status == "D",], x1[x1$status == "L",])}) results } But I can't figure out how to do the apply function correctly... Also wonder whether there's a way to use the apply-type function and aviod the loop all together. Thanks in advance! Ali
Henrique Dallazuanna
2010-Aug-21 14:52 UTC
[R] t.tests on a data.frame using an apply-type function
Try this: lapply(split(x, x$site), function(.x){ .xl <- split(.x[-(1:2)], .x$status) mapply(t.test, .xl[[1]], .xl[[2]], SIMPLIFY = FALSE) }) On Sat, Aug 21, 2010 at 11:15 AM, Alison Macalady <ali@kmhome.org> wrote:> I have a data.frame with ~250 observations (rows) in each of ~50 categories > (columns). I would like to perform t.tests on subsets of observations > within each column, with the subsets according to index vectors contained in > other columns of the data.frame. > > My data.frame looks something like this: > > x<-data.frame(matrix(rnorm(200,mean=5,sd=.5),nrow=20)) > colnames(x)<-c("site", "status", "X1", "X2", "X3", "X4", "X5", "X6", "X7", > "X8") > x$site<-as.factor(rep(c("A", "A", "B", "B", "C"), 4)) > x$status<-as.factor(rep(c("D", "L"), 10)) > > I want to do t.tests on the numeric observations within the data.frame by > "site" and by "status": > > t.test(x[x$site == "A" & x$status =="D",]$X1, x[x$site == "A" & x$status > =="L",]$X1) > t.test(x[x$site == "B" & x$status =="D",]$X1, x[x$site == "B" & x$status > =="L",]$X1) > t.test(x[x$site == "C" & x$status =="D",]$X1, x[x$site == "C" & x$status > =="L",]$X1) > > t.test(x[x$site == "A" & x$status =="D",]$X2, x[x$site == "A" & x$status > =="L",]$X2) > t.test(x[x$site == "B" & x$status =="D",]$X2, x[x$site == "B" & x$status > =="L",]$X2) > t.test(x[x$site == "C" & x$status =="D",]$X2, x[x$site == "C" & x$status > =="L",]$X2) > > etc... > > I know I must be able to do this more efficently using a loop and one of > the apply functions, e.g. something like this: > > k=length(levels(x$site)) > for (i in 1:k) > { > site<-levels(x$site)[i] > x1<-x[x$site == site, ] > results[i]<-apply(x1, 2, function(x1) {t.test(x1[x1$status == "D",], > x1[x1$status == "L",])}) > results > } > > But I can't figure out how to do the apply function correctly... > > Also wonder whether there's a way to use the apply-type function and aviod > the loop all together. > > Thanks in advance! > > Ali > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Dennis Murphy
2010-Aug-21 16:24 UTC
[R] t.tests on a data.frame using an apply-type function
Hi: Henrique's solution is elegant, but if you want to summarize certain features of the test (e.g., the value of the test statistic and its p-value), then here's a different approach using packages reshape and plyr. # Since your data in group C had a sample size of 2, I redid the data frame using more data. m <- matrix(rnorm(288), nrow = 36) colnames(m) <- paste('V', 1:8, sep = '') x <- data.frame(site = factor(rep(c('A', 'B', 'C'), each = 12)), status = factor(rep(rep(c('D','L'), each = 6), 3)), as.data.frame(m)) # This little trick stacks V1-V8 into a vector called value, # with an accompanying factor called variable. library(reshape) # melt is a function in the reshape package xm <- melt(x, id = c('site', 'status')) # xm has four variables: site, status, variable and value. # We now write a function that does the t-test and outputs # the value of the test statistic and the (two-sided) p-value. # To modify the arguments of the t.test call, modify the function # f accordingly. Ditto if you want to change the outputs. library(plyr) # ddply below is a function from this package f <- function(df) { u <- t.test(value ~ status, data = df) list(tstat = u$statistic, pval = u$p.value) } # The function is applied to all site/variable combinations # as.data.frame.function allows the output to be returned # as variables in a data frame u <- ddply(xm, .(site, variable), as.data.frame.function(f)) u site variable value.tstat value.pval 1 A V1 -2.36244305 0.04019757 2 A V2 0.35853212 0.73105571 3 A V3 -0.29033960 0.77796762 4 A V4 -0.39977559 0.69789482 5 A V5 0.73992896 0.47737988 6 A V6 2.41243447 0.03823083 7 A V7 0.37406273 0.71792150 8 A V8 -0.58363656 0.57388079 9 B V1 2.03180350 0.06968520 10 B V2 -0.63778310 0.53794510 11 B V3 1.66999237 0.12881606 12 B V4 0.89302839 0.39492211 13 B V5 -1.42946866 0.18349366 14 B V6 -0.52158791 0.61836960 15 B V7 1.44180092 0.18123210 16 B V8 0.50992197 0.62359868 17 C V1 1.12246634 0.29033521 18 C V2 1.06388885 0.31587500 19 C V3 0.32000364 0.75599890 20 C V4 0.95363381 0.36327043 21 C V5 -1.19511893 0.26058768 22 C V6 1.10885666 0.29526230 23 C V7 -0.08869988 0.93128143 24 C V8 2.85254620 0.01892610 HTH, Dennis On Sat, Aug 21, 2010 at 7:15 AM, Alison Macalady <ali@kmhome.org> wrote:> I have a data.frame with ~250 observations (rows) in each of ~50 categories > (columns). I would like to perform t.tests on subsets of observations > within each column, with the subsets according to index vectors contained in > other columns of the data.frame. > > My data.frame looks something like this: > > x<-data.frame(matrix(rnorm(200,mean=5,sd=.5),nrow=20)) > colnames(x)<-c("site", "status", "X1", "X2", "X3", "X4", "X5", "X6", "X7", > "X8") > x$site<-as.factor(rep(c("A", "A", "B", "B", "C"), 4)) > x$status<-as.factor(rep(c("D", "L"), 10)) > > I want to do t.tests on the numeric observations within the data.frame by > "site" and by "status": > > t.test(x[x$site == "A" & x$status =="D",]$X1, x[x$site == "A" & x$status > =="L",]$X1) > t.test(x[x$site == "B" & x$status =="D",]$X1, x[x$site == "B" & x$status > =="L",]$X1) > t.test(x[x$site == "C" & x$status =="D",]$X1, x[x$site == "C" & x$status > =="L",]$X1) > > t.test(x[x$site == "A" & x$status =="D",]$X2, x[x$site == "A" & x$status > =="L",]$X2) > t.test(x[x$site == "B" & x$status =="D",]$X2, x[x$site == "B" & x$status > =="L",]$X2) > t.test(x[x$site == "C" & x$status =="D",]$X2, x[x$site == "C" & x$status > =="L",]$X2) > > etc... > > I know I must be able to do this more efficently using a loop and one of > the apply functions, e.g. something like this: > > k=length(levels(x$site)) > for (i in 1:k) > { > site<-levels(x$site)[i] > x1<-x[x$site == site, ] > results[i]<-apply(x1, 2, function(x1) {t.test(x1[x1$status == "D",], > x1[x1$status == "L",])}) > results > } > > But I can't figure out how to do the apply function correctly... > > Also wonder whether there's a way to use the apply-type function and aviod > the loop all together. > > Thanks in advance! > > Ali > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]