I am writing a simple R program to execute a t-test repeatedly on data
contained in a data frame. My data looks like this:
Category Value1 Value2
1 .5 .8
1 .3 .9
. . . . . . . . .
2 1.4 1.3
2 1.3 1.3
. . . . . . . . .
15 .2 .3
15 .5 .1
So in all there are 15 categories, and each category contains two sets of
observations which I want to compare. I only want to compare Value1 and
Value2 within each category, but I need to do it 15 times (once for each
category), so I wanted to write an R function to make it easier.
Right now I am using a for() loop to do the comparison. My loop looks like
this:
for(i in 1:21)
{
x <- t.test(Value1[Category == i], Value2[Category == i])
y <- c(y, x$p.value)
}
The loop runs and everything is working well. However, I am not sure how to
translate this code into a function. In particular, I'm not sure how to
write a function that passes a data frame ds (containing Category, Value1,
and Value2 as members) as an argument, and then accessing these members
within the body of the function. I've tried the following:
repeated_test <- function(data)
{
for(i in 1:21)
{
x <- t.test(ds$Value1[ds$Category == i], ds$Value2[ds$Category
== i])
y <- c(y, x$p.value)
}
This will run, but only if the members of the data frame I am passing as an
argument are in fact named Value1, Value2, and Category. This is fine for
now, but in the future I will have to run this function on data where I
cannot be sure this is the case. Rather than change the member names by
hand, I would like to make the function generic to work with any data frame.
How do I do this? Or is there a better way to do this without the for()
loop (for example, using apply())?
[[alternative HTML version deleted]]
Henrique Dallazuanna
2008-May-02 14:51 UTC
[R] Accesing data frame members from within functions
Try:
foo <- function(data, ...)
{
res <- unlist(lapply(split(data, data$Category),
function(.x)t.test(.x$Value1, .x$Value2)$p.value))
test <- merge(data, as.data.frame(res), by.x="Category", by.y = 0)
return(test)
}
x <- data.frame(Category = rep(1:15, each = 10), Value1 = rnorm(150), Value2
= rnorm(150))
foo(x)
On Fri, May 2, 2008 at 11:19 AM, David Schwab <dvschwab46225@gmail.com>
wrote:
> I am writing a simple R program to execute a t-test repeatedly on data
> contained in a data frame. My data looks like this:
>
>
>
> Category Value1 Value2
>
> 1 .5 .8
>
> 1 .3 .9
>
> . . . . . . . . .
>
> 2 1.4 1.3
>
> 2 1.3 1.3
>
> . . . . . . . . .
>
> 15 .2 .3
>
> 15 .5 .1
>
>
>
>
>
> So in all there are 15 categories, and each category contains two sets of
> observations which I want to compare. I only want to compare Value1 and
> Value2 within each category, but I need to do it 15 times (once for each
> category), so I wanted to write an R function to make it easier.
>
>
>
> Right now I am using a for() loop to do the comparison. My loop looks
> like
> this:
>
>
>
> for(i in 1:21)
>
> {
>
> x <- t.test(Value1[Category == i], Value2[Category == i])
>
> y <- c(y, x$p.value)
>
> }
>
>
>
> The loop runs and everything is working well. However, I am not sure how
> to
> translate this code into a function. In particular, I'm not sure how
to
> write a function that passes a data frame ds (containing Category, Value1,
> and Value2 as members) as an argument, and then accessing these members
> within the body of the function. I've tried the following:
>
>
>
> repeated_test <- function(data)
>
> {
>
> for(i in 1:21)
>
> {
>
> x <- t.test(ds$Value1[ds$Category == i],
> ds$Value2[ds$Category
> == i])
>
> y <- c(y, x$p.value)
>
> }
>
>
>
> This will run, but only if the members of the data frame I am passing as
> an
> argument are in fact named Value1, Value2, and Category. This is fine for
> now, but in the future I will have to run this function on data where I
> cannot be sure this is the case. Rather than change the member names by
> hand, I would like to make the function generic to work with any data
> frame.
> How do I do this? Or is there a better way to do this without the for()
> loop (for example, using apply())?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O
[[alternative HTML version deleted]]
Jorge Ivan Velez
2008-May-02 15:47 UTC
[R] Accesing data frame members from within functions
Hi David,
Try this:
# Data set
set.seed(123)
Category=as.factor(rep(1:15,each=10))
Value1 = rnorm(150)
Value2= rnorm(150)
yourdata=data.frame(Category,Value1,Value2)
# Global function
TTEST=function(mydata){
# Internal function
tt=function(x,y) t.test(x,y)$p.value
# p-values
for(i in 1:length(levels(mydata$Category))){
mydatai=mydata[mydata$Category==i,][,-1]
res[i]=tt(mydatai[,1],mydatai[,2])
}
# Result
data.frame(Category=levels(Category),pvalue=res)
}
TTEST(yourdata)
Category pvalue
1 1 0.88699832
2 2 0.87711367
3 3 0.26075787
4 4 0.30382321
5 5 0.59213871
6 6 0.83755043
7 7 0.47836246
8 8 0.37509850
9 9 0.26132601
10 10 0.29195145
11 11 0.24169206
12 12 0.25594943
13 13 0.34882014
14 14 0.85755554
15 15 0.04556924
HTH,
Jorge
On Fri, May 2, 2008 at 10:19 AM, David Schwab <dvschwab46225@gmail.com>
wrote:
> I am writing a simple R program to execute a t-test repeatedly on data
> contained in a data frame. My data looks like this:
>
>
>
> Category Value1 Value2
>
> 1 .5 .8
>
> 1 .3 .9
>
> . . . . . . . . .
>
> 2 1.4 1.3
>
> 2 1.3 1.3
>
> . . . . . . . . .
>
> 15 .2 .3
>
> 15 .5 .1
>
>
>
>
>
> So in all there are 15 categories, and each category contains two sets of
> observations which I want to compare. I only want to compare Value1 and
> Value2 within each category, but I need to do it 15 times (once for each
> category), so I wanted to write an R function to make it easier.
>
>
>
> Right now I am using a for() loop to do the comparison. My loop looks
> like
> this:
>
>
>
> for(i in 1:21)
>
> {
>
> x <- t.test(Value1[Category == i], Value2[Category == i])
>
> y <- c(y, x$p.value)
>
> }
>
>
>
> The loop runs and everything is working well. However, I am not sure how
> to
> translate this code into a function. In particular, I'm not sure how
to
> write a function that passes a data frame ds (containing Category, Value1,
> and Value2 as members) as an argument, and then accessing these members
> within the body of the function. I've tried the following:
>
>
>
> repeated_test <- function(data)
>
> {
>
> for(i in 1:21)
>
> {
>
> x <- t.test(ds$Value1[ds$Category == i],
> ds$Value2[ds$Category
> == i])
>
> y <- c(y, x$p.value)
>
> }
>
>
>
> This will run, but only if the members of the data frame I am passing as
> an
> argument are in fact named Value1, Value2, and Category. This is fine for
> now, but in the future I will have to run this function on data where I
> cannot be sure this is the case. Rather than change the member names by
> hand, I would like to make the function generic to work with any data
> frame.
> How do I do this? Or is there a better way to do this without the for()
> loop (for example, using apply())?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]