thr3ads.net - R help - [R] extracting from data frame multiple times [Oct 2005]

If this information is useful, please help other people find it:
Share via:

Ido M. Tamir

2005-Oct-14 13:33 UTC

[R] extracting from data frame multiple times

Hello,
i am trying to subset a dataframe multiple times:
something like:

stats <- by(df, list(items), ttestData)

ttestData <- function(df){
    t.test( df[,c(2,3,4), df[,c(5,6,7)]
}

While this works for small data, it is to slow for my
actual data: 500000 rows dataframe with 
about 135000 different indices, subsetting the
dataframe into chunks of 5 on average.

Do you have any suggestions how I could speed this up?

I tried changing to call by reference with
the package ref, but by does not like that.

Thank you very much in advance
Ido Tamir

Thomas Lumley

2005-Oct-14 14:38 UTC

head link

[R] extracting from data frame multiple times

On Fri, 14 Oct 2005, Ido M. Tamir wrote:
> Hello,
> i am trying to subset a dataframe multiple times:
> something like:
>
> stats <- by(df, list(items), ttestData)
>
> ttestData <- function(df){
>    t.test( df[,c(2,3,4), df[,c(5,6,7)]
> }
>
> While this works for small data, it is to slow for my
> actual data: 500000 rows dataframe with
> about 135000 different indices, subsetting the
> dataframe into chunks of 5 on average.
>
> Do you have any suggestions how I could speed this up?
The first step is to find out what is too slow, using Rprof().  It may be 
the t.test or it may be the by().

If it is the by() you could put the numeric data into two matrices
   x1<-df[,2:4]
   x2<-df[,5:7]
order them so that the same "item" entries were adjacent, compute the 
start and end indices for each group, and do something like
lapply(1:howevermany, function(i)
t.test(x1[start[i]:end[i],],x2[start[i]:end[i]))
Even just turning df into a matrix might help

If it is the repeated t.test() calls that are too slow you need to speed 
them up.  You can probably rowsum() to compute means and variances for all 
the t-tests at once.

 	-thomas

R help - Oct 2005 - extracting from data frame multiple times

[R] extracting from data frame multiple times

[R] extracting from data frame multiple times