Alison Macalady
2010-Aug-21 14:15 UTC
[R] t.tests on a data.frame using an apply-type function
I have a data.frame with ~250 observations (rows) in each of ~50
categories (columns). I would like to perform t.tests on subsets of
observations within each column, with the subsets according to index
vectors contained in other columns of the data.frame.
My data.frame looks something like this:
x<-data.frame(matrix(rnorm(200,mean=5,sd=.5),nrow=20))
colnames(x)<-c("site", "status", "X1",
"X2", "X3", "X4", "X5", "X6",
"X7", "X8")
x$site<-as.factor(rep(c("A", "A", "B",
"B", "C"), 4))
x$status<-as.factor(rep(c("D", "L"), 10))
I want to do t.tests on the numeric observations within the data.frame
by "site" and by "status":
t.test(x[x$site == "A" & x$status =="D",]$X1, x[x$site
== "A" & x
$status =="L",]$X1)
t.test(x[x$site == "B" & x$status =="D",]$X1, x[x$site
== "B" & x
$status =="L",]$X1)
t.test(x[x$site == "C" & x$status =="D",]$X1, x[x$site
== "C" & x
$status =="L",]$X1)
t.test(x[x$site == "A" & x$status =="D",]$X2, x[x$site
== "A" & x
$status =="L",]$X2)
t.test(x[x$site == "B" & x$status =="D",]$X2, x[x$site
== "B" & x
$status =="L",]$X2)
t.test(x[x$site == "C" & x$status =="D",]$X2, x[x$site
== "C" & x
$status =="L",]$X2)
etc...
I know I must be able to do this more efficently using a loop and one
of the apply functions, e.g. something like this:
k=length(levels(x$site))
for (i in 1:k)
{
site<-levels(x$site)[i]
x1<-x[x$site == site, ]
results[i]<-apply(x1, 2, function(x1) {t.test(x1[x1$status ==
"D",],
x1[x1$status == "L",])})
results
}
But I can't figure out how to do the apply function correctly...
Also wonder whether there's a way to use the apply-type function and
aviod the loop all together.
Thanks in advance!
Ali
Henrique Dallazuanna
2010-Aug-21 14:52 UTC
[R] t.tests on a data.frame using an apply-type function
Try this:
lapply(split(x, x$site),
function(.x){
.xl <- split(.x[-(1:2)], .x$status)
mapply(t.test, .xl[[1]], .xl[[2]], SIMPLIFY = FALSE)
})
On Sat, Aug 21, 2010 at 11:15 AM, Alison Macalady <ali@kmhome.org> wrote:
> I have a data.frame with ~250 observations (rows) in each of ~50 categories
> (columns). I would like to perform t.tests on subsets of observations
> within each column, with the subsets according to index vectors contained
in
> other columns of the data.frame.
>
> My data.frame looks something like this:
>
> x<-data.frame(matrix(rnorm(200,mean=5,sd=.5),nrow=20))
> colnames(x)<-c("site", "status", "X1",
"X2", "X3", "X4", "X5", "X6",
"X7",
> "X8")
> x$site<-as.factor(rep(c("A", "A", "B",
"B", "C"), 4))
> x$status<-as.factor(rep(c("D", "L"), 10))
>
> I want to do t.tests on the numeric observations within the data.frame by
> "site" and by "status":
>
> t.test(x[x$site == "A" & x$status =="D",]$X1,
x[x$site == "A" & x$status
> =="L",]$X1)
> t.test(x[x$site == "B" & x$status =="D",]$X1,
x[x$site == "B" & x$status
> =="L",]$X1)
> t.test(x[x$site == "C" & x$status =="D",]$X1,
x[x$site == "C" & x$status
> =="L",]$X1)
>
> t.test(x[x$site == "A" & x$status =="D",]$X2,
x[x$site == "A" & x$status
> =="L",]$X2)
> t.test(x[x$site == "B" & x$status =="D",]$X2,
x[x$site == "B" & x$status
> =="L",]$X2)
> t.test(x[x$site == "C" & x$status =="D",]$X2,
x[x$site == "C" & x$status
> =="L",]$X2)
>
> etc...
>
> I know I must be able to do this more efficently using a loop and one of
> the apply functions, e.g. something like this:
>
> k=length(levels(x$site))
> for (i in 1:k)
> {
> site<-levels(x$site)[i]
> x1<-x[x$site == site, ]
> results[i]<-apply(x1, 2, function(x1) {t.test(x1[x1$status ==
"D",],
> x1[x1$status == "L",])})
> results
> }
>
> But I can't figure out how to do the apply function correctly...
>
> Also wonder whether there's a way to use the apply-type function and
aviod
> the loop all together.
>
> Thanks in advance!
>
> Ali
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O
[[alternative HTML version deleted]]
Dennis Murphy
2010-Aug-21 16:24 UTC
[R] t.tests on a data.frame using an apply-type function
Hi:
Henrique's solution is elegant, but if you want to summarize certain
features of the test (e.g., the value of the test statistic and its
p-value), then here's a different approach using packages reshape and plyr.
# Since your data in group C had a sample size of 2, I redid the data frame
using more data.
m <- matrix(rnorm(288), nrow = 36)
colnames(m) <- paste('V', 1:8, sep = '')
x <- data.frame(site = factor(rep(c('A', 'B', 'C'),
each = 12)),
status = factor(rep(rep(c('D','L'), each = 6),
3)),
as.data.frame(m))
# This little trick stacks V1-V8 into a vector called value,
# with an accompanying factor called variable.
library(reshape) # melt is a function in the reshape package
xm <- melt(x, id = c('site', 'status'))
# xm has four variables: site, status, variable and value.
# We now write a function that does the t-test and outputs
# the value of the test statistic and the (two-sided) p-value.
# To modify the arguments of the t.test call, modify the function
# f accordingly. Ditto if you want to change the outputs.
library(plyr) # ddply below is a function from this package
f <- function(df) {
u <- t.test(value ~ status, data = df)
list(tstat = u$statistic, pval = u$p.value)
}
# The function is applied to all site/variable combinations
# as.data.frame.function allows the output to be returned
# as variables in a data frame
u <- ddply(xm, .(site, variable), as.data.frame.function(f))
u
site variable value.tstat value.pval
1 A V1 -2.36244305 0.04019757
2 A V2 0.35853212 0.73105571
3 A V3 -0.29033960 0.77796762
4 A V4 -0.39977559 0.69789482
5 A V5 0.73992896 0.47737988
6 A V6 2.41243447 0.03823083
7 A V7 0.37406273 0.71792150
8 A V8 -0.58363656 0.57388079
9 B V1 2.03180350 0.06968520
10 B V2 -0.63778310 0.53794510
11 B V3 1.66999237 0.12881606
12 B V4 0.89302839 0.39492211
13 B V5 -1.42946866 0.18349366
14 B V6 -0.52158791 0.61836960
15 B V7 1.44180092 0.18123210
16 B V8 0.50992197 0.62359868
17 C V1 1.12246634 0.29033521
18 C V2 1.06388885 0.31587500
19 C V3 0.32000364 0.75599890
20 C V4 0.95363381 0.36327043
21 C V5 -1.19511893 0.26058768
22 C V6 1.10885666 0.29526230
23 C V7 -0.08869988 0.93128143
24 C V8 2.85254620 0.01892610
HTH,
Dennis
On Sat, Aug 21, 2010 at 7:15 AM, Alison Macalady <ali@kmhome.org> wrote:
> I have a data.frame with ~250 observations (rows) in each of ~50 categories
> (columns). I would like to perform t.tests on subsets of observations
> within each column, with the subsets according to index vectors contained
in
> other columns of the data.frame.
>
> My data.frame looks something like this:
>
> x<-data.frame(matrix(rnorm(200,mean=5,sd=.5),nrow=20))
> colnames(x)<-c("site", "status", "X1",
"X2", "X3", "X4", "X5", "X6",
"X7",
> "X8")
> x$site<-as.factor(rep(c("A", "A", "B",
"B", "C"), 4))
> x$status<-as.factor(rep(c("D", "L"), 10))
>
> I want to do t.tests on the numeric observations within the data.frame by
> "site" and by "status":
>
> t.test(x[x$site == "A" & x$status =="D",]$X1,
x[x$site == "A" & x$status
> =="L",]$X1)
> t.test(x[x$site == "B" & x$status =="D",]$X1,
x[x$site == "B" & x$status
> =="L",]$X1)
> t.test(x[x$site == "C" & x$status =="D",]$X1,
x[x$site == "C" & x$status
> =="L",]$X1)
>
> t.test(x[x$site == "A" & x$status =="D",]$X2,
x[x$site == "A" & x$status
> =="L",]$X2)
> t.test(x[x$site == "B" & x$status =="D",]$X2,
x[x$site == "B" & x$status
> =="L",]$X2)
> t.test(x[x$site == "C" & x$status =="D",]$X2,
x[x$site == "C" & x$status
> =="L",]$X2)
>
> etc...
>
> I know I must be able to do this more efficently using a loop and one of
> the apply functions, e.g. something like this:
>
> k=length(levels(x$site))
> for (i in 1:k)
> {
> site<-levels(x$site)[i]
> x1<-x[x$site == site, ]
> results[i]<-apply(x1, 2, function(x1) {t.test(x1[x1$status ==
"D",],
> x1[x1$status == "L",])})
> results
> }
>
> But I can't figure out how to do the apply function correctly...
>
> Also wonder whether there's a way to use the apply-type function and
aviod
> the loop all together.
>
> Thanks in advance!
>
> Ali
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]