thr3ads.net - R help - [R] multiple t-tests across similar variable names [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Nundy, Shantanu

2012-Oct-10 23:09 UTC

[R] multiple t-tests across similar variable names

Hi everyone-

I have a dataset with multiple "pre" and "post" variables I
want to compare. The variables are named "apple_pre" or
"pre_banana" with the corresponding post variables named
"apple_post" or "post_banana". The variables are in no
particular order.

apple_pre orange_pre orange_post pre_banana apple_post post_banana
person_1
person_2
person_3
...
person_x


How do I:
1. Run a series of paired t-tests for the apple_pre variables and pre_banana
variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
2. Print the results from these t-tests in a table with col 1=mean difference,
col 2= 95% conf interval, col 3=p-value.

Thank you kindly,
-Shantanu

Shantanu Nundy, M.D.
University of Chicago

	[[alternative HTML version deleted]]

Rui Barradas

2012-Oct-11 01:38 UTC

head link

[R] multiple t-tests across similar variable names

Hello,

Could you post a data example? Using, with data.frame named 'dat'

dput( head(dat, 30) )  # paste the output of this in a post

I have written code that creates pairs pre/post columns but it can't 
really be tested.

Hope this helps,

Rui Barradas
Em 11-10-2012 00:09, Nundy, Shantanu escreveu:> Hi everyone-
>
> I have a dataset with multiple "pre" and "post"
variables I want to compare. The variables are named "apple_pre" or
"pre_banana" with the corresponding post variables named
"apple_post" or "post_banana". The variables are in no
particular order.
>
> apple_pre orange_pre orange_post pre_banana apple_post post_banana
> person_1
> person_2
> person_3
> ...
> person_x
>
>
> How do I:
> 1. Run a series of paired t-tests for the apple_pre variables and
pre_banana variables? Would be great to do something like
ttest(*.*pre*.*,*.*post*.*).
> 2. Print the results from these t-tests in a table with col 1=mean
difference, col 2= 95% conf interval, col 3=p-value.
>
> Thank you kindly,
> -Shantanu
>
> Shantanu Nundy, M.D.
> University of Chicago
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2012-Oct-11 03:22 UTC

head link

[R] multiple t-tests across similar variable names

HI,
May be this helps you.
set.seed(1)
dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE))
list2<-list(dat2[regmatches(colnames(dat2),regexpr("apple.*",colnames(dat2)))],dat2[regmatches(colnames(dat2),regexpr("banana.*",colnames(dat2)))])
res2<-do.call(rbind,lapply(lapply(list2,function(x)
t.test(x[,1],x[,2],paired=TRUE)),function(x)
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
row.names(res2)<-unlist(unique(lapply(strsplit(colnames(dat1),"_"),`[`,1)))
res2
#?????? meandifference???? CIlow??? CIhigh???? p.value
#apple??????????? -9.8 -15.02385 -4.576150 0.006477650
#banana????????? -15.4 -21.64546 -9.154541 0.002382261

A.K.




----- Original Message -----
From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Wednesday, October 10, 2012 7:09 PM
Subject: Re: [R] multiple t-tests across similar variable names

Hi everyone-

I have a dataset with multiple "pre" and "post" variables I
want to compare. The variables are named "apple_pre" or
"pre_banana" with the corresponding post variables named
"apple_post" or "post_banana". The variables are in no
particular order.

apple_pre orange_pre orange_post pre_banana apple_post post_banana
person_1
person_2
person_3
...
person_x


How do I:
1. Run a series of paired t-tests for the apple_pre variables and pre_banana
variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
2. Print the results from these t-tests in a table with col 1=mean difference,
col 2= 95% conf interval, col 3=p-value.

Thank you kindly,
-Shantanu

Shantanu Nundy, M.D.
University of Chicago

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

2012-Oct-11 03:27 UTC

head link

[R] multiple t-tests across similar variable names

HI,
A typo in my solution:
row.names(res2)<-unlist(unique(lapply(strsplit(colnames(dat2),"_"),`[`,1)))

????????????????????????????????????????????????????????????????????????????????
^^^^
??? 
??? 

A.K.



----- Original Message -----
From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Wednesday, October 10, 2012 7:09 PM
Subject: Re: [R] multiple t-tests across similar variable names

Hi everyone-

I have a dataset with multiple "pre" and "post" variables I
want to compare. The variables are named "apple_pre" or
"pre_banana" with the corresponding post variables named
"apple_post" or "post_banana". The variables are in no
particular order.

apple_pre orange_pre orange_post pre_banana apple_post post_banana
person_1
person_2
person_3
...
person_x


How do I:
1. Run a series of paired t-tests for the apple_pre variables and pre_banana
variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
2. Print the results from these t-tests in a table with col 1=mean difference,
col 2= 95% conf interval, col 3=p-value.

Thank you kindly,
-Shantanu

Shantanu Nundy, M.D.
University of Chicago

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

2012-Oct-11 04:31 UTC

head link

[R] multiple t-tests across similar variable names

HI,

If you have a lot of variables and in no order, then it would be better to order
the data by column names.
For e.g.
set.seed(432)
dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
dat3<-dat2[order(colnames(dat2))] #order the columns
list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
res3<-do.call(rbind,lapply(lapply(list3,function(x)
t.test(x[,1],x[,2],paired=TRUE)),function(x)
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1)))
res3
#???? meandifference???? CIlow?? CIhigh????? p.value
#apple??????????? 12.6? 8.519476 16.68052 0.0010166626
#banana?????????? 15.0 12.088040 17.91196 0.0001388506
#orange?????????? 18.2 13.604166 22.79583 0.0003888560

A.K.



----- Original Message -----
From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Wednesday, October 10, 2012 7:09 PM
Subject: Re: [R] multiple t-tests across similar variable names

Hi everyone-

I have a dataset with multiple "pre" and "post" variables I
want to compare. The variables are named "apple_pre" or
"pre_banana" with the corresponding post variables named
"apple_post" or "post_banana". The variables are in no
particular order.

apple_pre orange_pre orange_post pre_banana apple_post post_banana
person_1
person_2
person_3
...
person_x


How do I:
1. Run a series of paired t-tests for the apple_pre variables and pre_banana
variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
2. Print the results from these t-tests in a table with col 1=mean difference,
col 2= 95% conf interval, col 3=p-value.

Thank you kindly,
-Shantanu

Shantanu Nundy, M.D.
University of Chicago

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2012-Oct-11 14:51 UTC

head link

[R] multiple t-tests across similar variable names

Hello,

If that is the problem now, then change the variables' names.
In what follows, the first line is just the example you gave. In the 
actual runnunig code uncomment the commented out lines.

vars <-  c("red_apple_pre", "post_banana_organic")
#vars <- names(dat)
vars <- gsub("_pre", "=pre", vars)
vars <- gsub("_post", "=post", vars)
vars <- gsub("pre_", "pre=", vars)
vars <- gsub("post_", "post=", vars)
vars <- gsub("_", "\\.", vars)
vars <- sub("=", "_", vars)
#names(dat) <- vars

Rui Barradas
Em 11-10-2012 15:17, Nundy, Shantanu escreveu:> Actually, I see now that part of the problem is that many of the names have
multiple underscores such as "red_apple_pre" or
"post_banana_organic". I think this is causing a problem for this line
in your code:
>> vmat <- do.call(rbind, strsplit(vars, "_"))
> Shantanu
>
>
>
> ________________________________________
> From: Nundy, Shantanu
> Sent: Thursday, October 11, 2012 9:07 AM
> To: Rui Barradas
> Subject: RE: [R] multiple t-tests across similar variable names
>
> Rui,
> Thank you so much for your solution. It is exactly what I was struggling
with!
>
> One small question. When I ran the code on my actual dataset I got the
error below:
>
>> vars <- names(master)
>> vmat <- do.call(rbind, strsplit(vars, "_"))
> Warning message:
> In function (..., deparse.level = 1)  :
>    number of columns of result is not a multiple of vector length (arg 1)
>
> My guess is that the problem is not all the variables have "pre"
or "post" in them. Some of the variables are constants that I will not
do a paired t-test on. What would be the easiest way to get around this, perhaps
even by simply removing all of the variables that have neither "pre"
or "post" in them?
>
> Thanks again,
> Shantanu
>
>
>
>
>
>
>
> ________________________________________
> From: arun [smartpink111 at yahoo.com]
> Sent: Thursday, October 11, 2012 8:50 AM
> To: Rui Barradas
> Cc: Nundy, Shantanu
> Subject: Re: [R] multiple t-tests across similar variable names
>
> HI Rui,
>
>   Thanks for testing the code. I will look into it later.
> A.K.
>
>
>
>
> ----- Original Message -----
> From: Rui Barradas <ruipbarradas at sapo.pt>
> To: arun <smartpink111 at yahoo.com>; "Nundy, Shantanu"
<snundy at chicagobooth.edu>
> Cc: R help <r-help at r-project.org>
> Sent: Thursday, October 11, 2012 9:25 AM
> Subject: Re: [R] multiple t-tests across similar variable names
>
> Hello,
>
> I have a problem, with your data example my results are different. I have
changed the names of two of the variables, to allow for 'pre' and
'post' to be first in the names.
>
> # auxiliary functions
> ifswap <- function(x)
>      if(x[1] %in% c("pre", "post")) x[2:1] else x
>
> getpair <- function(i, post)
>      post[ which(vmat[post, 1] == vmat[i, 1]) ]
>
> makeLine <- function(h)
>      c(MeanDiff = unname(h$estimate),
>          CIlower = h$conf.int[1],
>          CIupper = h$conf.int[2],
>          p.value = h$p.value)
>
> doTests <- function(DF, Pairs){
>      t.list <- lapply( seq_len(nrow(Pairs)), function(i)
>          t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
>      do.call(rbind, lapply(t.list, makeLine))
> }
>
> # dataset
> set.seed(432)
> dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE),
>              orange_post = sample(18:28,5,replace=TRUE),
>              pre_banana = sample(25:35,5,replace=TRUE),  # here
>              apple_post = sample(20:30,5,replace=TRUE),
>              post_banana = sample(40:50,5,replace=TRUE), # and here
>              orange_pre = sample(5:10,5,replace=TRUE))
>
>
> #--------------------------------
> # start processing the data.frame
> # Make pairs of pre/post columns
> vars <- names(dat2)
> vmat <- do.call(rbind, strsplit(vars, "_"))
> vmat <- t(apply(vmat, 1, ifswap))
> pre <- which(vmat[, 2] == "pre")
> post <- which(vmat[, 2] == "post")
> post <- sapply(pre, getpair, post)
> pairs <- matrix(c(pre, post), ncol = 2)
>
> # now the tests
> result <- doTests(dat2, pairs)
> rownames(result) <- vmat[pre, 1]
> result
>
>
> In your results I believe that the values for meandifference are the means
of x[, 1], at least that's what I've got.
> Anyway, I'll see both codes again, to try to see what's going on.
>
> Hope this helps,
>
> Rui Barradas
>
> Em 11-10-2012 05:31, arun escreveu:
>> HI,
>>
>> If you have a lot of variables and in no order, then it would be better
to order the data by column names.
>> For e.g.
>> set.seed(432)
>>
dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
>> dat3<-dat2[order(colnames(dat2))] #order the columns
>> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
>> res3<-do.call(rbind,lapply(lapply(list3,function(x)
t.test(x[,1],x[,2],paired=TRUE)),function(x)
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
>>
row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1)))
>> res3
>> #     meandifference     CIlow   CIhigh      p.value
>> #apple            12.6  8.519476 16.68052 0.0010166626
>> #banana           15.0 12.088040 17.91196 0.0001388506
>> #orange           18.2 13.604166 22.79583 0.0003888560
>>
>> A.K.
>>
>>
>>
>> ----- Original Message -----
>> From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
>> To: "r-help at r-project.org" <r-help at r-project.org>
>> Cc:
>> Sent: Wednesday, October 10, 2012 7:09 PM
>> Subject: Re: [R] multiple t-tests across similar variable names
>>
>> Hi everyone-
>>
>> I have a dataset with multiple "pre" and "post"
variables I want to compare. The variables are named "apple_pre" or
"pre_banana" with the corresponding post variables named
"apple_post" or "post_banana". The variables are in no
particular order.
>>
>> apple_pre orange_pre orange_post pre_banana apple_post post_banana
>> person_1
>> person_2
>> person_3
>> ...
>> person_x
>>
>>
>> How do I:
>> 1. Run a series of paired t-tests for the apple_pre variables and
pre_banana variables? Would be great to do something like
ttest(*.*pre*.*,*.*post*.*).
>> 2. Print the results from these t-tests in a table with col 1=mean
difference, col 2= 95% conf interval, col 3=p-value.
>>
>> Thank you kindly,
>> -Shantanu
>>
>> Shantanu Nundy, M.D.
>> University of Chicago
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

arun

2012-Oct-11 17:06 UTC

head link

[R] multiple t-tests across similar variable names

Hi Shantanu,

I guess the below code should solve both the issues:

set.seed(432)
dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
?colnames(dat2)<-gsub("^pre\\_(.*)","\\1_pre",gsub("^post\\_(.*)","\\1_post",colnames(dat2)))
dat3<-t(dat2[order(colnames(dat2))])
dat3<-data.frame(varName=gsub("(.*)\\_.*","\\1",row.names(dat3)),dat3)
list3<-lapply(split(dat3,dat3$varName),function(x) t(x[-1]))
res3<-do.call(rbind,lapply(lapply(list3,function(x)
t.test(x[,1],x[,2],paired=TRUE)),function(x)
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
res3
#????? meandifference???? CIlow?? CIhigh????? p.value
#apple??????????? 12.6? 8.519476 16.68052 0.0010166626
#banana?????????? 15.0 12.088040 17.91196 0.0001388506
#orange?????????? 18.2 13.604166 22.79583 0.0003888560
A.K.




----- Original Message -----
From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Thursday, October 11, 2012 10:22 AM
Subject: RE: [R] multiple t-tests across similar variable names

hi Arun,
This is very helpful thanks. 

I'm running into a couple issues:
1. Since some of the variables start with "pre_apple" and others
"apple_post" sorting the variables doesn't completely put pre-post
variables next to each other.
2. I have about 50 variables so typing this line is a bit cumbersome:
> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
Thanks,
Shantanu

________________________________________
From: arun [smartpink111 at yahoo.com]
Sent: Thursday, October 11, 2012 9:14 AM
To: Rui Barradas
Cc: Nundy, Shantanu; R help
Subject: Re: [R] multiple t-tests across similar variable names

HI Rui,

By running your code, I got the results as:
result
#? ? ?  MeanDiff?  CIlower? ? CIupper? ? ? p.value
#apple? ?  -12.6 -16.68052? -8.519476 0.0010166626
#banana? ? -15.0 -17.91196 -12.088040 0.0001388506
#orange? ? -18.2 -22.79583 -13.604166 0.0003888560

From my code:
res3
#? ? ?  meandifference? ?  CIlow?  CIhigh? ? ? p.value
#apple? ? ? ? ? ? 12.6? 8.519476 16.68052 0.0010166626
#banana? ? ? ? ?  15.0 12.088040 17.91196 0.0001388506
#orange? ? ? ? ?  18.2 13.604166 22.79583 0.0003888560

There is difference in signs.
A.K.




----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: arun <smartpink111 at yahoo.com>; "Nundy, Shantanu"
<snundy at chicagobooth.edu>
Cc: R help <r-help at r-project.org>
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names

Hello,

I have a problem, with your data example my results are different. I have
changed the names of two of the variables, to allow for 'pre' and
'post' to be first in the names.

# auxiliary functions
ifswap <- function(x)
? ? if(x[1] %in% c("pre", "post")) x[2:1] else x

getpair <- function(i, post)
? ? post[ which(vmat[post, 1] == vmat[i, 1]) ]

makeLine <- function(h)
? ? c(MeanDiff = unname(h$estimate),
? ? ? ? CIlower = h$conf.int[1],
? ? ? ? CIupper = h$conf.int[2],
? ? ? ? p.value = h$p.value)

doTests <- function(DF, Pairs){
? ? t.list <- lapply( seq_len(nrow(Pairs)), function(i)
? ? ? ? t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
? ? do.call(rbind, lapply(t.list, makeLine))
}

# dataset
set.seed(432)
dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE),
? ? ? ? ? ? orange_post = sample(18:28,5,replace=TRUE),
? ? ? ? ? ? pre_banana = sample(25:35,5,replace=TRUE),? # here
? ? ? ? ? ? apple_post = sample(20:30,5,replace=TRUE),
? ? ? ? ? ? post_banana = sample(40:50,5,replace=TRUE), # and here
? ? ? ? ? ? orange_pre = sample(5:10,5,replace=TRUE))


#--------------------------------
# start processing the data.frame
# Make pairs of pre/post columns
vars <- names(dat2)
vmat <- do.call(rbind, strsplit(vars, "_"))
vmat <- t(apply(vmat, 1, ifswap))
pre <- which(vmat[, 2] == "pre")
post <- which(vmat[, 2] == "post")
post <- sapply(pre, getpair, post)
pairs <- matrix(c(pre, post), ncol = 2)

# now the tests
result <- doTests(dat2, pairs)
rownames(result) <- vmat[pre, 1]
result


In your results I believe that the values for meandifference are the means of
x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.

Hope this helps,

Rui Barradas

Em 11-10-2012 05:31, arun escreveu:> HI,
>
> If you have a lot of variables and in no order, then it would be better to
order the data by column names.
> For e.g.
> set.seed(432)
>
dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
> dat3<-dat2[order(colnames(dat2))] #order the columns
> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
> res3<-do.call(rbind,lapply(lapply(list3,function(x)
t.test(x[,1],x[,2],paired=TRUE)),function(x)
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
>
row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1)))
> res3
> #? ?  meandifference? ?  CIlow?  CIhigh? ? ? p.value
> #apple? ? ? ? ? ? 12.6? 8.519476 16.68052 0.0010166626
> #banana? ? ? ? ?  15.0 12.088040 17.91196 0.0001388506
> #orange? ? ? ? ?  18.2 13.604166 22.79583 0.0003888560
>
> A.K.
>
>
>
> ----- Original Message -----
> From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
> To: "r-help at r-project.org" <r-help at r-project.org>
> Cc:
> Sent: Wednesday, October 10, 2012 7:09 PM
> Subject: Re: [R] multiple t-tests across similar variable names
>
> Hi everyone-
>
> I have a dataset with multiple "pre" and "post"
variables I want to compare. The variables are named "apple_pre" or
"pre_banana" with the corresponding post variables named
"apple_post" or "post_banana". The variables are in no
particular order.
>
> apple_pre orange_pre orange_post pre_banana apple_post post_banana
> person_1
> person_2
> person_3
> ...
> person_x
>
>
> How do I:
> 1. Run a series of paired t-tests for the apple_pre variables and
pre_banana variables? Would be great to do something like
ttest(*.*pre*.*,*.*post*.*).
> 2. Print the results from these t-tests in a table with col 1=mean
difference, col 2= 95% conf interval, col 3=p-value.
>
> Thank you kindly,
> -Shantanu
>
> Shantanu Nundy, M.D.
> University of Chicago
>
>?? ?  [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2012-Oct-11 18:55 UTC

head link

[R] multiple t-tests across similar variable names

HI Shantanu,

I saw your reply to Rui regarding multiple underscores in Nabble:

(Actually, I see now that part of the problem is that many of the 
names have multiple underscores such as "red_apple_pre" or 
"post_banana_organic". I think this is causing a problem for this line
in your code:)

I wasn't aware of that problem. In that case, try this:
set.seed(432)
dat2<-data.frame(red_apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana_organic=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
?nam1<-c("apple","orange","banana")
?nam2<-c("pre","post")
colnames(dat2)<-unlist(lapply(lapply(strsplit(colnames(dat2),"_"),function(x)
x[x%in%nam1|x%in%nam2]),function(x) paste(x[1],x[2],sep="_")))
colnames(dat2)<-gsub("^pre\\_(.*)","\\1_pre",gsub("^post\\_(.*)","\\1_post",colnames(dat2)))
dat3<-t(dat2[order(colnames(dat2))])
dat3<-data.frame(varName=gsub("(.*)\\_.*","\\1",row.names(dat3)),dat3)
list3<-lapply(split(dat3,dat3$varName),function(x) t(x[-1]))
res3<-do.call(rbind,lapply(lapply(list3,function(x)
t.test(x[,1],x[,2],paired=TRUE)),function(x)
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
res3
#???? meandifference???? CIlow?? CIhigh????? p.value
#apple??????????? 12.6? 8.519476 16.68052 0.0010166626
#banana?????????? 15.0 12.088040 17.91196 0.0001388506
#orange?????????? 18.2 13.604166 22.79583 0.0003888560


I hope this works.
A.K.






----- Original Message -----
From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Thursday, October 11, 2012 10:22 AM
Subject: RE: [R] multiple t-tests across similar variable names

hi Arun,
This is very helpful thanks. 

I'm running into a couple issues:
1. Since some of the variables start with "pre_apple" and others
"apple_post" sorting the variables doesn't completely put pre-post
variables next to each other.
2. I have about 50 variables so typing this line is a bit cumbersome:
> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
Thanks,
Shantanu

________________________________________
From: arun [smartpink111 at yahoo.com]
Sent: Thursday, October 11, 2012 9:14 AM
To: Rui Barradas
Cc: Nundy, Shantanu; R help
Subject: Re: [R] multiple t-tests across similar variable names

HI Rui,

By running your code, I got the results as:
result
#? ? ?  MeanDiff?  CIlower? ? CIupper? ? ? p.value
#apple? ?  -12.6 -16.68052? -8.519476 0.0010166626
#banana? ? -15.0 -17.91196 -12.088040 0.0001388506
#orange? ? -18.2 -22.79583 -13.604166 0.0003888560

From my code:
res3
#? ? ?  meandifference? ?  CIlow?  CIhigh? ? ? p.value
#apple? ? ? ? ? ? 12.6? 8.519476 16.68052 0.0010166626
#banana? ? ? ? ?  15.0 12.088040 17.91196 0.0001388506
#orange? ? ? ? ?  18.2 13.604166 22.79583 0.0003888560

There is difference in signs.
A.K.




----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: arun <smartpink111 at yahoo.com>; "Nundy, Shantanu"
<snundy at chicagobooth.edu>
Cc: R help <r-help at r-project.org>
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names

Hello,

I have a problem, with your data example my results are different. I have
changed the names of two of the variables, to allow for 'pre' and
'post' to be first in the names.

# auxiliary functions
ifswap <- function(x)
? ? if(x[1] %in% c("pre", "post")) x[2:1] else x

getpair <- function(i, post)
? ? post[ which(vmat[post, 1] == vmat[i, 1]) ]

makeLine <- function(h)
? ? c(MeanDiff = unname(h$estimate),
? ? ? ? CIlower = h$conf.int[1],
? ? ? ? CIupper = h$conf.int[2],
? ? ? ? p.value = h$p.value)

doTests <- function(DF, Pairs){
? ? t.list <- lapply( seq_len(nrow(Pairs)), function(i)
? ? ? ? t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
? ? do.call(rbind, lapply(t.list, makeLine))
}

# dataset
set.seed(432)
dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE),
? ? ? ? ? ? orange_post = sample(18:28,5,replace=TRUE),
? ? ? ? ? ? pre_banana = sample(25:35,5,replace=TRUE),? # here
? ? ? ? ? ? apple_post = sample(20:30,5,replace=TRUE),
? ? ? ? ? ? post_banana = sample(40:50,5,replace=TRUE), # and here
? ? ? ? ? ? orange_pre = sample(5:10,5,replace=TRUE))


#--------------------------------
# start processing the data.frame
# Make pairs of pre/post columns
vars <- names(dat2)
vmat <- do.call(rbind, strsplit(vars, "_"))
vmat <- t(apply(vmat, 1, ifswap))
pre <- which(vmat[, 2] == "pre")
post <- which(vmat[, 2] == "post")
post <- sapply(pre, getpair, post)
pairs <- matrix(c(pre, post), ncol = 2)

# now the tests
result <- doTests(dat2, pairs)
rownames(result) <- vmat[pre, 1]
result


In your results I believe that the values for meandifference are the means of
x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.

Hope this helps,

Rui Barradas

Em 11-10-2012 05:31, arun escreveu:> HI,
>
> If you have a lot of variables and in no order, then it would be better to
order the data by column names.
> For e.g.
> set.seed(432)
>
dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
> dat3<-dat2[order(colnames(dat2))] #order the columns
> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
> res3<-do.call(rbind,lapply(lapply(list3,function(x)
t.test(x[,1],x[,2],paired=TRUE)),function(x)
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
>
row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1)))
> res3
> #? ?  meandifference? ?  CIlow?  CIhigh? ? ? p.value
> #apple? ? ? ? ? ? 12.6? 8.519476 16.68052 0.0010166626
> #banana? ? ? ? ?  15.0 12.088040 17.91196 0.0001388506
> #orange? ? ? ? ?  18.2 13.604166 22.79583 0.0003888560
>
> A.K.
>
>
>
> ----- Original Message -----
> From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
> To: "r-help at r-project.org" <r-help at r-project.org>
> Cc:
> Sent: Wednesday, October 10, 2012 7:09 PM
> Subject: Re: [R] multiple t-tests across similar variable names
>
> Hi everyone-
>
> I have a dataset with multiple "pre" and "post"
variables I want to compare. The variables are named "apple_pre" or
"pre_banana" with the corresponding post variables named
"apple_post" or "post_banana". The variables are in no
particular order.
>
> apple_pre orange_pre orange_post pre_banana apple_post post_banana
> person_1
> person_2
> person_3
> ...
> person_x
>
>
> How do I:
> 1. Run a series of paired t-tests for the apple_pre variables and
pre_banana variables? Would be great to do something like
ttest(*.*pre*.*,*.*post*.*).
> 2. Print the results from these t-tests in a table with col 1=mean
difference, col 2= 95% conf interval, col 3=p-value.
>
> Thank you kindly,
> -Shantanu
>
> Shantanu Nundy, M.D.
> University of Chicago
>
>?? ?  [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Oct 2012 - multiple t-tests across similar variable names

[R] multiple t-tests across similar variable names

[R] multiple t-tests across similar variable names

[R] multiple t-tests across similar variable names

[R] multiple t-tests across similar variable names

[R] multiple t-tests across similar variable names

[R] multiple t-tests across similar variable names

[R] multiple t-tests across similar variable names

[R] multiple t-tests across similar variable names

Maybe Matching Threads