thr3ads.net - R help - [R] post [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Alexey Ush

2010-Sep-13 20:26 UTC

[R] post

Hello,

I have a question regarding how to speed up the t.test on large dataset. For
example, I have a table "tab" which looks like:

	a	b	c	d	e 	f	g	h....
1	
2
3
4
5

...

100000

dim(tab) is 100000 x 100



I need to do the t.test for each row on the two subsets of columns, ie to
compare a b d group against e f g group at each row.


subset 1:					
	a	b	d
1	
2
3
4
5

...

100000


subset 2:
	e	f	g
1	
2
3
4
5

...

100000

    100000 t.test's for each row for these two subsets will take around 1
min. The prblem is that I have around 10000 different combinations of such a
subsets. therefore 1min*10000
=10000min in the case if I will use "for" loop like this:

n1=10000 #number of subset combinations
for (i1 in 1:n1) {

n2=100000 # number of rows
i2=1
for (i2 in 1:n1) {
	t.test(tab[i2,v5],tab[i2,v6])$p.value  #v5 and v6 are vectors containing the
veriable names for the two subsets (they are different for each loop)
	}

}


My question is there more efficient way how to do this computations in a short
period of time? Any packages, like plyr? May be direct calculations isted of
using t.test function?


Thank you.

Hadley Wickham

2010-Sep-14 00:32 UTC

head link

[R] post

Have a look at:

"Computing Thousands of Test Statistics Simultaneously in R" by Holger
Schwender and Tina M?ller, in
http://stat-computing.org/newsletter/issues/scgn-18-1.pdf

Hadley

On Mon, Sep 13, 2010 at 4:26 PM, Alexey Ush <ushan26 at yahoo.com>
wrote:> Hello,
>
> I have a question regarding how to speed up the t.test on large dataset.
For example, I have a table "tab" which looks like:
>
> ? ? ? ?a ? ? ? b ? ? ? c ? ? ? d ? ? ? e ? ? ? f ? ? ? g ? ? ? h....
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 100000
>
> dim(tab) is 100000 x 100
>
>
>
> I need to do the t.test for each row on the two subsets of columns, ie to
compare a b d group against e f g group at each row.
>
>
> subset 1:
> ? ? ? ?a ? ? ? b ? ? ? d
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 100000
>
>
> subset 2:
> ? ? ? ?e ? ? ? f ? ? ? g
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 100000
>
> ? ?100000 t.test's for each row for these two subsets will take around
1 min. The prblem is that I have around 10000 different combinations of such a
subsets. therefore 1min*10000
> =10000min in the case if I will use "for" loop like this:
>
> n1=10000 #number of subset combinations
> for (i1 in 1:n1) {
>
> n2=100000 # number of rows
> i2=1
> for (i2 in 1:n1) {
> ? ? ? ?t.test(tab[i2,v5],tab[i2,v6])$p.value ?#v5 and v6 are vectors
containing the veriable names for the two subsets (they are different for each
loop)
> ? ? ? ?}
>
> }
>
>
> My question is there more efficient way how to do this computations in a
short period of time? Any packages, like plyr? May be direct calculations isted
of using t.test function?
>
>
> Thank you.
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Juliet Hannah

2010-Sep-18 13:31 UTC

head link

[R] post

See if  rowttests is any faster.

library(genefilter)
?rowttests

You have to install Bioconductor. I've used this on large datasets,
but I haven't compared
timings.

On Mon, Sep 13, 2010 at 4:26 PM, Alexey Ush <ushan26 at yahoo.com>
wrote:> Hello,
>
> I have a question regarding how to speed up the t.test on large dataset.
For example, I have a table "tab" which looks like:
>
> ? ? ? ?a ? ? ? b ? ? ? c ? ? ? d ? ? ? e ? ? ? f ? ? ? g ? ? ? h....
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 100000
>
> dim(tab) is 100000 x 100
>
>
>
> I need to do the t.test for each row on the two subsets of columns, ie to
compare a b d group against e f g group at each row.
>
>
> subset 1:
> ? ? ? ?a ? ? ? b ? ? ? d
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 100000
>
>
> subset 2:
> ? ? ? ?e ? ? ? f ? ? ? g
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 100000
>
> ? ?100000 t.test's for each row for these two subsets will take around
1 min. The prblem is that I have around 10000 different combinations of such a
subsets. therefore 1min*10000
> =10000min in the case if I will use "for" loop like this:
>
> n1=10000 #number of subset combinations
> for (i1 in 1:n1) {
>
> n2=100000 # number of rows
> i2=1
> for (i2 in 1:n1) {
> ? ? ? ?t.test(tab[i2,v5],tab[i2,v6])$p.value ?#v5 and v6 are vectors
containing the veriable names for the two subsets (they are different for each
loop)
> ? ? ? ?}
>
> }
>
>
> My question is there more efficient way how to do this computations in a
short period of time? Any packages, like plyr? May be direct calculations isted
of using t.test function?
>
>
> Thank you.
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Reasonably Related Threads

Search for more maybe matching threads

R help - Sep 2010 - post

[R] post

[R] post

[R] post

Reasonably Related Threads