Hi everyone- I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order. apple_pre orange_pre orange_post pre_banana apple_post post_banana person_1 person_2 person_3 ... person_x How do I: 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. Thank you kindly, -Shantanu Shantanu Nundy, M.D. University of Chicago [[alternative HTML version deleted]]
Hello, Could you post a data example? Using, with data.frame named 'dat' dput( head(dat, 30) ) # paste the output of this in a post I have written code that creates pairs pre/post columns but it can't really be tested. Hope this helps, Rui Barradas Em 11-10-2012 00:09, Nundy, Shantanu escreveu:> Hi everyone- > > I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order. > > apple_pre orange_pre orange_post pre_banana apple_post post_banana > person_1 > person_2 > person_3 > ... > person_x > > > How do I: > 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). > 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. > > Thank you kindly, > -Shantanu > > Shantanu Nundy, M.D. > University of Chicago > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
HI, May be this helps you. set.seed(1) dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE)) list2<-list(dat2[regmatches(colnames(dat2),regexpr("apple.*",colnames(dat2)))],dat2[regmatches(colnames(dat2),regexpr("banana.*",colnames(dat2)))]) res2<-do.call(rbind,lapply(lapply(list2,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) row.names(res2)<-unlist(unique(lapply(strsplit(colnames(dat1),"_"),`[`,1))) res2 #?????? meandifference???? CIlow??? CIhigh???? p.value #apple??????????? -9.8 -15.02385 -4.576150 0.006477650 #banana????????? -15.4 -21.64546 -9.154541 0.002382261 A.K. ----- Original Message ----- From: "Nundy, Shantanu" <snundy at chicagobooth.edu> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Wednesday, October 10, 2012 7:09 PM Subject: Re: [R] multiple t-tests across similar variable names Hi everyone- I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order. apple_pre orange_pre orange_post pre_banana apple_post post_banana person_1 person_2 person_3 ... person_x How do I: 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. Thank you kindly, -Shantanu Shantanu Nundy, M.D. University of Chicago ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
HI, A typo in my solution: row.names(res2)<-unlist(unique(lapply(strsplit(colnames(dat2),"_"),`[`,1))) ???????????????????????????????????????????????????????????????????????????????? ^^^^ ??? ??? A.K. ----- Original Message ----- From: "Nundy, Shantanu" <snundy at chicagobooth.edu> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Wednesday, October 10, 2012 7:09 PM Subject: Re: [R] multiple t-tests across similar variable names Hi everyone- I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order. apple_pre orange_pre orange_post pre_banana apple_post post_banana person_1 person_2 person_3 ... person_x How do I: 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. Thank you kindly, -Shantanu Shantanu Nundy, M.D. University of Chicago ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
HI, If you have a lot of variables and in no order, then it would be better to order the data by column names. For e.g. set.seed(432) dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) dat3<-dat2[order(colnames(dat2))] #order the columns list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1))) res3 #???? meandifference???? CIlow?? CIhigh????? p.value #apple??????????? 12.6? 8.519476 16.68052 0.0010166626 #banana?????????? 15.0 12.088040 17.91196 0.0001388506 #orange?????????? 18.2 13.604166 22.79583 0.0003888560 A.K. ----- Original Message ----- From: "Nundy, Shantanu" <snundy at chicagobooth.edu> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Wednesday, October 10, 2012 7:09 PM Subject: Re: [R] multiple t-tests across similar variable names Hi everyone- I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order. apple_pre orange_pre orange_post pre_banana apple_post post_banana person_1 person_2 person_3 ... person_x How do I: 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. Thank you kindly, -Shantanu Shantanu Nundy, M.D. University of Chicago ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello, If that is the problem now, then change the variables' names. In what follows, the first line is just the example you gave. In the actual runnunig code uncomment the commented out lines. vars <- c("red_apple_pre", "post_banana_organic") #vars <- names(dat) vars <- gsub("_pre", "=pre", vars) vars <- gsub("_post", "=post", vars) vars <- gsub("pre_", "pre=", vars) vars <- gsub("post_", "post=", vars) vars <- gsub("_", "\\.", vars) vars <- sub("=", "_", vars) #names(dat) <- vars Rui Barradas Em 11-10-2012 15:17, Nundy, Shantanu escreveu:> Actually, I see now that part of the problem is that many of the names have multiple underscores such as "red_apple_pre" or "post_banana_organic". I think this is causing a problem for this line in your code: >> vmat <- do.call(rbind, strsplit(vars, "_")) > Shantanu > > > > ________________________________________ > From: Nundy, Shantanu > Sent: Thursday, October 11, 2012 9:07 AM > To: Rui Barradas > Subject: RE: [R] multiple t-tests across similar variable names > > Rui, > Thank you so much for your solution. It is exactly what I was struggling with! > > One small question. When I ran the code on my actual dataset I got the error below: > >> vars <- names(master) >> vmat <- do.call(rbind, strsplit(vars, "_")) > Warning message: > In function (..., deparse.level = 1) : > number of columns of result is not a multiple of vector length (arg 1) > > My guess is that the problem is not all the variables have "pre" or "post" in them. Some of the variables are constants that I will not do a paired t-test on. What would be the easiest way to get around this, perhaps even by simply removing all of the variables that have neither "pre" or "post" in them? > > Thanks again, > Shantanu > > > > > > > > ________________________________________ > From: arun [smartpink111 at yahoo.com] > Sent: Thursday, October 11, 2012 8:50 AM > To: Rui Barradas > Cc: Nundy, Shantanu > Subject: Re: [R] multiple t-tests across similar variable names > > HI Rui, > > Thanks for testing the code. I will look into it later. > A.K. > > > > > ----- Original Message ----- > From: Rui Barradas <ruipbarradas at sapo.pt> > To: arun <smartpink111 at yahoo.com>; "Nundy, Shantanu" <snundy at chicagobooth.edu> > Cc: R help <r-help at r-project.org> > Sent: Thursday, October 11, 2012 9:25 AM > Subject: Re: [R] multiple t-tests across similar variable names > > Hello, > > I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names. > > # auxiliary functions > ifswap <- function(x) > if(x[1] %in% c("pre", "post")) x[2:1] else x > > getpair <- function(i, post) > post[ which(vmat[post, 1] == vmat[i, 1]) ] > > makeLine <- function(h) > c(MeanDiff = unname(h$estimate), > CIlower = h$conf.int[1], > CIupper = h$conf.int[2], > p.value = h$p.value) > > doTests <- function(DF, Pairs){ > t.list <- lapply( seq_len(nrow(Pairs)), function(i) > t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) ) > do.call(rbind, lapply(t.list, makeLine)) > } > > # dataset > set.seed(432) > dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE), > orange_post = sample(18:28,5,replace=TRUE), > pre_banana = sample(25:35,5,replace=TRUE), # here > apple_post = sample(20:30,5,replace=TRUE), > post_banana = sample(40:50,5,replace=TRUE), # and here > orange_pre = sample(5:10,5,replace=TRUE)) > > > #-------------------------------- > # start processing the data.frame > # Make pairs of pre/post columns > vars <- names(dat2) > vmat <- do.call(rbind, strsplit(vars, "_")) > vmat <- t(apply(vmat, 1, ifswap)) > pre <- which(vmat[, 2] == "pre") > post <- which(vmat[, 2] == "post") > post <- sapply(pre, getpair, post) > pairs <- matrix(c(pre, post), ncol = 2) > > # now the tests > result <- doTests(dat2, pairs) > rownames(result) <- vmat[pre, 1] > result > > > In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got. > Anyway, I'll see both codes again, to try to see what's going on. > > Hope this helps, > > Rui Barradas > > Em 11-10-2012 05:31, arun escreveu: >> HI, >> >> If you have a lot of variables and in no order, then it would be better to order the data by column names. >> For e.g. >> set.seed(432) >> dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) >> dat3<-dat2[order(colnames(dat2))] #order the columns >> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) >> res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) >> row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1))) >> res3 >> # meandifference CIlow CIhigh p.value >> #apple 12.6 8.519476 16.68052 0.0010166626 >> #banana 15.0 12.088040 17.91196 0.0001388506 >> #orange 18.2 13.604166 22.79583 0.0003888560 >> >> A.K. >> >> >> >> ----- Original Message ----- >> From: "Nundy, Shantanu" <snundy at chicagobooth.edu> >> To: "r-help at r-project.org" <r-help at r-project.org> >> Cc: >> Sent: Wednesday, October 10, 2012 7:09 PM >> Subject: Re: [R] multiple t-tests across similar variable names >> >> Hi everyone- >> >> I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order. >> >> apple_pre orange_pre orange_post pre_banana apple_post post_banana >> person_1 >> person_2 >> person_3 >> ... >> person_x >> >> >> How do I: >> 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). >> 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. >> >> Thank you kindly, >> -Shantanu >> >> Shantanu Nundy, M.D. >> University of Chicago >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
Hi Shantanu, I guess the below code should solve both the issues: set.seed(432) dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) ?colnames(dat2)<-gsub("^pre\\_(.*)","\\1_pre",gsub("^post\\_(.*)","\\1_post",colnames(dat2))) dat3<-t(dat2[order(colnames(dat2))]) dat3<-data.frame(varName=gsub("(.*)\\_.*","\\1",row.names(dat3)),dat3) list3<-lapply(split(dat3,dat3$varName),function(x) t(x[-1])) res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) res3 #????? meandifference???? CIlow?? CIhigh????? p.value #apple??????????? 12.6? 8.519476 16.68052 0.0010166626 #banana?????????? 15.0 12.088040 17.91196 0.0001388506 #orange?????????? 18.2 13.604166 22.79583 0.0003888560 A.K. ----- Original Message ----- From: "Nundy, Shantanu" <snundy at chicagobooth.edu> To: arun <smartpink111 at yahoo.com> Cc: Sent: Thursday, October 11, 2012 10:22 AM Subject: RE: [R] multiple t-tests across similar variable names hi Arun, This is very helpful thanks. I'm running into a couple issues: 1. Since some of the variables start with "pre_apple" and others "apple_post" sorting the variables doesn't completely put pre-post variables next to each other. 2. I have about 50 variables so typing this line is a bit cumbersome:> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])Thanks, Shantanu ________________________________________ From: arun [smartpink111 at yahoo.com] Sent: Thursday, October 11, 2012 9:14 AM To: Rui Barradas Cc: Nundy, Shantanu; R help Subject: Re: [R] multiple t-tests across similar variable names HI Rui, By running your code, I got the results as: result #? ? ? MeanDiff? CIlower? ? CIupper? ? ? p.value #apple? ? -12.6 -16.68052? -8.519476 0.0010166626 #banana? ? -15.0 -17.91196 -12.088040 0.0001388506 #orange? ? -18.2 -22.79583 -13.604166 0.0003888560 From my code: res3 #? ? ? meandifference? ? CIlow? CIhigh? ? ? p.value #apple? ? ? ? ? ? 12.6? 8.519476 16.68052 0.0010166626 #banana? ? ? ? ? 15.0 12.088040 17.91196 0.0001388506 #orange? ? ? ? ? 18.2 13.604166 22.79583 0.0003888560 There is difference in signs. A.K. ----- Original Message ----- From: Rui Barradas <ruipbarradas at sapo.pt> To: arun <smartpink111 at yahoo.com>; "Nundy, Shantanu" <snundy at chicagobooth.edu> Cc: R help <r-help at r-project.org> Sent: Thursday, October 11, 2012 9:25 AM Subject: Re: [R] multiple t-tests across similar variable names Hello, I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names. # auxiliary functions ifswap <- function(x) ? ? if(x[1] %in% c("pre", "post")) x[2:1] else x getpair <- function(i, post) ? ? post[ which(vmat[post, 1] == vmat[i, 1]) ] makeLine <- function(h) ? ? c(MeanDiff = unname(h$estimate), ? ? ? ? CIlower = h$conf.int[1], ? ? ? ? CIupper = h$conf.int[2], ? ? ? ? p.value = h$p.value) doTests <- function(DF, Pairs){ ? ? t.list <- lapply( seq_len(nrow(Pairs)), function(i) ? ? ? ? t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) ) ? ? do.call(rbind, lapply(t.list, makeLine)) } # dataset set.seed(432) dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE), ? ? ? ? ? ? orange_post = sample(18:28,5,replace=TRUE), ? ? ? ? ? ? pre_banana = sample(25:35,5,replace=TRUE),? # here ? ? ? ? ? ? apple_post = sample(20:30,5,replace=TRUE), ? ? ? ? ? ? post_banana = sample(40:50,5,replace=TRUE), # and here ? ? ? ? ? ? orange_pre = sample(5:10,5,replace=TRUE)) #-------------------------------- # start processing the data.frame # Make pairs of pre/post columns vars <- names(dat2) vmat <- do.call(rbind, strsplit(vars, "_")) vmat <- t(apply(vmat, 1, ifswap)) pre <- which(vmat[, 2] == "pre") post <- which(vmat[, 2] == "post") post <- sapply(pre, getpair, post) pairs <- matrix(c(pre, post), ncol = 2) # now the tests result <- doTests(dat2, pairs) rownames(result) <- vmat[pre, 1] result In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got. Anyway, I'll see both codes again, to try to see what's going on. Hope this helps, Rui Barradas Em 11-10-2012 05:31, arun escreveu:> HI, > > If you have a lot of variables and in no order, then it would be better to order the data by column names. > For e.g. > set.seed(432) > dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) > dat3<-dat2[order(colnames(dat2))] #order the columns > list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) > res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) > row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1))) > res3 > #? ? meandifference? ? CIlow? CIhigh? ? ? p.value > #apple? ? ? ? ? ? 12.6? 8.519476 16.68052 0.0010166626 > #banana? ? ? ? ? 15.0 12.088040 17.91196 0.0001388506 > #orange? ? ? ? ? 18.2 13.604166 22.79583 0.0003888560 > > A.K. > > > > ----- Original Message ----- > From: "Nundy, Shantanu" <snundy at chicagobooth.edu> > To: "r-help at r-project.org" <r-help at r-project.org> > Cc: > Sent: Wednesday, October 10, 2012 7:09 PM > Subject: Re: [R] multiple t-tests across similar variable names > > Hi everyone- > > I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order. > > apple_pre orange_pre orange_post pre_banana apple_post post_banana > person_1 > person_2 > person_3 > ... > person_x > > > How do I: > 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). > 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. > > Thank you kindly, > -Shantanu > > Shantanu Nundy, M.D. > University of Chicago > >?? ? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
HI Shantanu, I saw your reply to Rui regarding multiple underscores in Nabble: (Actually, I see now that part of the problem is that many of the names have multiple underscores such as "red_apple_pre" or "post_banana_organic". I think this is causing a problem for this line in your code:) I wasn't aware of that problem. In that case, try this: set.seed(432) dat2<-data.frame(red_apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana_organic=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) ?nam1<-c("apple","orange","banana") ?nam2<-c("pre","post") colnames(dat2)<-unlist(lapply(lapply(strsplit(colnames(dat2),"_"),function(x) x[x%in%nam1|x%in%nam2]),function(x) paste(x[1],x[2],sep="_"))) colnames(dat2)<-gsub("^pre\\_(.*)","\\1_pre",gsub("^post\\_(.*)","\\1_post",colnames(dat2))) dat3<-t(dat2[order(colnames(dat2))]) dat3<-data.frame(varName=gsub("(.*)\\_.*","\\1",row.names(dat3)),dat3) list3<-lapply(split(dat3,dat3$varName),function(x) t(x[-1])) res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) res3 #???? meandifference???? CIlow?? CIhigh????? p.value #apple??????????? 12.6? 8.519476 16.68052 0.0010166626 #banana?????????? 15.0 12.088040 17.91196 0.0001388506 #orange?????????? 18.2 13.604166 22.79583 0.0003888560 I hope this works. A.K. ----- Original Message ----- From: "Nundy, Shantanu" <snundy at chicagobooth.edu> To: arun <smartpink111 at yahoo.com> Cc: Sent: Thursday, October 11, 2012 10:22 AM Subject: RE: [R] multiple t-tests across similar variable names hi Arun, This is very helpful thanks. I'm running into a couple issues: 1. Since some of the variables start with "pre_apple" and others "apple_post" sorting the variables doesn't completely put pre-post variables next to each other. 2. I have about 50 variables so typing this line is a bit cumbersome:> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])Thanks, Shantanu ________________________________________ From: arun [smartpink111 at yahoo.com] Sent: Thursday, October 11, 2012 9:14 AM To: Rui Barradas Cc: Nundy, Shantanu; R help Subject: Re: [R] multiple t-tests across similar variable names HI Rui, By running your code, I got the results as: result #? ? ? MeanDiff? CIlower? ? CIupper? ? ? p.value #apple? ? -12.6 -16.68052? -8.519476 0.0010166626 #banana? ? -15.0 -17.91196 -12.088040 0.0001388506 #orange? ? -18.2 -22.79583 -13.604166 0.0003888560 From my code: res3 #? ? ? meandifference? ? CIlow? CIhigh? ? ? p.value #apple? ? ? ? ? ? 12.6? 8.519476 16.68052 0.0010166626 #banana? ? ? ? ? 15.0 12.088040 17.91196 0.0001388506 #orange? ? ? ? ? 18.2 13.604166 22.79583 0.0003888560 There is difference in signs. A.K. ----- Original Message ----- From: Rui Barradas <ruipbarradas at sapo.pt> To: arun <smartpink111 at yahoo.com>; "Nundy, Shantanu" <snundy at chicagobooth.edu> Cc: R help <r-help at r-project.org> Sent: Thursday, October 11, 2012 9:25 AM Subject: Re: [R] multiple t-tests across similar variable names Hello, I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names. # auxiliary functions ifswap <- function(x) ? ? if(x[1] %in% c("pre", "post")) x[2:1] else x getpair <- function(i, post) ? ? post[ which(vmat[post, 1] == vmat[i, 1]) ] makeLine <- function(h) ? ? c(MeanDiff = unname(h$estimate), ? ? ? ? CIlower = h$conf.int[1], ? ? ? ? CIupper = h$conf.int[2], ? ? ? ? p.value = h$p.value) doTests <- function(DF, Pairs){ ? ? t.list <- lapply( seq_len(nrow(Pairs)), function(i) ? ? ? ? t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) ) ? ? do.call(rbind, lapply(t.list, makeLine)) } # dataset set.seed(432) dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE), ? ? ? ? ? ? orange_post = sample(18:28,5,replace=TRUE), ? ? ? ? ? ? pre_banana = sample(25:35,5,replace=TRUE),? # here ? ? ? ? ? ? apple_post = sample(20:30,5,replace=TRUE), ? ? ? ? ? ? post_banana = sample(40:50,5,replace=TRUE), # and here ? ? ? ? ? ? orange_pre = sample(5:10,5,replace=TRUE)) #-------------------------------- # start processing the data.frame # Make pairs of pre/post columns vars <- names(dat2) vmat <- do.call(rbind, strsplit(vars, "_")) vmat <- t(apply(vmat, 1, ifswap)) pre <- which(vmat[, 2] == "pre") post <- which(vmat[, 2] == "post") post <- sapply(pre, getpair, post) pairs <- matrix(c(pre, post), ncol = 2) # now the tests result <- doTests(dat2, pairs) rownames(result) <- vmat[pre, 1] result In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got. Anyway, I'll see both codes again, to try to see what's going on. Hope this helps, Rui Barradas Em 11-10-2012 05:31, arun escreveu:> HI, > > If you have a lot of variables and in no order, then it would be better to order the data by column names. > For e.g. > set.seed(432) > dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) > dat3<-dat2[order(colnames(dat2))] #order the columns > list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) > res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) > row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1))) > res3 > #? ? meandifference? ? CIlow? CIhigh? ? ? p.value > #apple? ? ? ? ? ? 12.6? 8.519476 16.68052 0.0010166626 > #banana? ? ? ? ? 15.0 12.088040 17.91196 0.0001388506 > #orange? ? ? ? ? 18.2 13.604166 22.79583 0.0003888560 > > A.K. > > > > ----- Original Message ----- > From: "Nundy, Shantanu" <snundy at chicagobooth.edu> > To: "r-help at r-project.org" <r-help at r-project.org> > Cc: > Sent: Wednesday, October 10, 2012 7:09 PM > Subject: Re: [R] multiple t-tests across similar variable names > > Hi everyone- > > I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order. > > apple_pre orange_pre orange_post pre_banana apple_post post_banana > person_1 > person_2 > person_3 > ... > person_x > > > How do I: > 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). > 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. > > Thank you kindly, > -Shantanu > > Shantanu Nundy, M.D. > University of Chicago > >?? ? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.