I am writing a simple R program to execute a t-test repeatedly on data contained in a data frame. My data looks like this: Category Value1 Value2 1 .5 .8 1 .3 .9 . . . . . . . . . 2 1.4 1.3 2 1.3 1.3 . . . . . . . . . 15 .2 .3 15 .5 .1 So in all there are 15 categories, and each category contains two sets of observations which I want to compare. I only want to compare Value1 and Value2 within each category, but I need to do it 15 times (once for each category), so I wanted to write an R function to make it easier. Right now I am using a for() loop to do the comparison. My loop looks like this: for(i in 1:21) { x <- t.test(Value1[Category == i], Value2[Category == i]) y <- c(y, x$p.value) } The loop runs and everything is working well. However, I am not sure how to translate this code into a function. In particular, I'm not sure how to write a function that passes a data frame ds (containing Category, Value1, and Value2 as members) as an argument, and then accessing these members within the body of the function. I've tried the following: repeated_test <- function(data) { for(i in 1:21) { x <- t.test(ds$Value1[ds$Category == i], ds$Value2[ds$Category == i]) y <- c(y, x$p.value) } This will run, but only if the members of the data frame I am passing as an argument are in fact named Value1, Value2, and Category. This is fine for now, but in the future I will have to run this function on data where I cannot be sure this is the case. Rather than change the member names by hand, I would like to make the function generic to work with any data frame. How do I do this? Or is there a better way to do this without the for() loop (for example, using apply())? [[alternative HTML version deleted]]
Henrique Dallazuanna
2008-May-02 14:51 UTC
[R] Accesing data frame members from within functions
Try: foo <- function(data, ...) { res <- unlist(lapply(split(data, data$Category), function(.x)t.test(.x$Value1, .x$Value2)$p.value)) test <- merge(data, as.data.frame(res), by.x="Category", by.y = 0) return(test) } x <- data.frame(Category = rep(1:15, each = 10), Value1 = rnorm(150), Value2 = rnorm(150)) foo(x) On Fri, May 2, 2008 at 11:19 AM, David Schwab <dvschwab46225@gmail.com> wrote:> I am writing a simple R program to execute a t-test repeatedly on data > contained in a data frame. My data looks like this: > > > > Category Value1 Value2 > > 1 .5 .8 > > 1 .3 .9 > > . . . . . . . . . > > 2 1.4 1.3 > > 2 1.3 1.3 > > . . . . . . . . . > > 15 .2 .3 > > 15 .5 .1 > > > > > > So in all there are 15 categories, and each category contains two sets of > observations which I want to compare. I only want to compare Value1 and > Value2 within each category, but I need to do it 15 times (once for each > category), so I wanted to write an R function to make it easier. > > > > Right now I am using a for() loop to do the comparison. My loop looks > like > this: > > > > for(i in 1:21) > > { > > x <- t.test(Value1[Category == i], Value2[Category == i]) > > y <- c(y, x$p.value) > > } > > > > The loop runs and everything is working well. However, I am not sure how > to > translate this code into a function. In particular, I'm not sure how to > write a function that passes a data frame ds (containing Category, Value1, > and Value2 as members) as an argument, and then accessing these members > within the body of the function. I've tried the following: > > > > repeated_test <- function(data) > > { > > for(i in 1:21) > > { > > x <- t.test(ds$Value1[ds$Category == i], > ds$Value2[ds$Category > == i]) > > y <- c(y, x$p.value) > > } > > > > This will run, but only if the members of the data frame I am passing as > an > argument are in fact named Value1, Value2, and Category. This is fine for > now, but in the future I will have to run this function on data where I > cannot be sure this is the case. Rather than change the member names by > hand, I would like to make the function generic to work with any data > frame. > How do I do this? Or is there a better way to do this without the for() > loop (for example, using apply())? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Jorge Ivan Velez
2008-May-02 15:47 UTC
[R] Accesing data frame members from within functions
Hi David, Try this: # Data set set.seed(123) Category=as.factor(rep(1:15,each=10)) Value1 = rnorm(150) Value2= rnorm(150) yourdata=data.frame(Category,Value1,Value2) # Global function TTEST=function(mydata){ # Internal function tt=function(x,y) t.test(x,y)$p.value # p-values for(i in 1:length(levels(mydata$Category))){ mydatai=mydata[mydata$Category==i,][,-1] res[i]=tt(mydatai[,1],mydatai[,2]) } # Result data.frame(Category=levels(Category),pvalue=res) } TTEST(yourdata) Category pvalue 1 1 0.88699832 2 2 0.87711367 3 3 0.26075787 4 4 0.30382321 5 5 0.59213871 6 6 0.83755043 7 7 0.47836246 8 8 0.37509850 9 9 0.26132601 10 10 0.29195145 11 11 0.24169206 12 12 0.25594943 13 13 0.34882014 14 14 0.85755554 15 15 0.04556924 HTH, Jorge On Fri, May 2, 2008 at 10:19 AM, David Schwab <dvschwab46225@gmail.com> wrote:> I am writing a simple R program to execute a t-test repeatedly on data > contained in a data frame. My data looks like this: > > > > Category Value1 Value2 > > 1 .5 .8 > > 1 .3 .9 > > . . . . . . . . . > > 2 1.4 1.3 > > 2 1.3 1.3 > > . . . . . . . . . > > 15 .2 .3 > > 15 .5 .1 > > > > > > So in all there are 15 categories, and each category contains two sets of > observations which I want to compare. I only want to compare Value1 and > Value2 within each category, but I need to do it 15 times (once for each > category), so I wanted to write an R function to make it easier. > > > > Right now I am using a for() loop to do the comparison. My loop looks > like > this: > > > > for(i in 1:21) > > { > > x <- t.test(Value1[Category == i], Value2[Category == i]) > > y <- c(y, x$p.value) > > } > > > > The loop runs and everything is working well. However, I am not sure how > to > translate this code into a function. In particular, I'm not sure how to > write a function that passes a data frame ds (containing Category, Value1, > and Value2 as members) as an argument, and then accessing these members > within the body of the function. I've tried the following: > > > > repeated_test <- function(data) > > { > > for(i in 1:21) > > { > > x <- t.test(ds$Value1[ds$Category == i], > ds$Value2[ds$Category > == i]) > > y <- c(y, x$p.value) > > } > > > > This will run, but only if the members of the data frame I am passing as > an > argument are in fact named Value1, Value2, and Category. This is fine for > now, but in the future I will have to run this function on data where I > cannot be sure this is the case. Rather than change the member names by > hand, I would like to make the function generic to work with any data > frame. > How do I do this? Or is there a better way to do this without the for() > loop (for example, using apply())? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]