thr3ads.net - R help - [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up? [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Emmanuel Levy

2008-Aug-12 23:35 UTC

[R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

Dear All,

I have a large data frame ( 2700000 lines and 14 columns), and I would like to
extract the information in a particular way illustrated below:


Given a data frame "df":
> col1=sample(c(0,1),10, rep=T)
> names = factor(c(rep("A",5),rep("B",5)))
> df = data.frame(names,col1)
> df   names col1
1      A    1
2      A    0
3      A    1
4      A    0
5      A    1
6      B    0
7      B    0
8      B    1
9      B    0
10     B    0

I would like to tranform it in the form:
> index = c("A","B")
> col1[[1]]=df$col1[which(df$name=="A")]
> col1[[2]]=df$col1[which(df$name=="B")]
My problem is that the command:  *** which(df$name=="A") ***
takes about 1 second because df is so big.

I was thinking that a "level" could maybe be accessed instantly but I
am not
sure about how to do it.

I would be very grateful for any advice that would allow me to speed this up.

Best wishes,

Emmanuel

Peter Cowan

2008-Aug-13 02:31 UTC

head link

[R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

Emmanuel,

On Tue, Aug 12, 2008 at 4:35 PM, Emmanuel Levy <emmanuel.levy at
gmail.com> wrote:> Dear All,
>
> I have a large data frame ( 2700000 lines and 14 columns), and I would like
to
> extract the information in a particular way illustrated below:
>
>
> Given a data frame "df":
>
>> col1=sample(c(0,1),10, rep=T)
>> names = factor(c(rep("A",5),rep("B",5)))
>> df = data.frame(names,col1)
>> df
>   names col1
> 1      A    1
> 2      A    0
> 3      A    1
> 4      A    0
> 5      A    1
> 6      B    0
> 7      B    0
> 8      B    1
> 9      B    0
> 10     B    0
>
> I would like to tranform it in the form:
>
>> index = c("A","B")
>> col1[[1]]=df$col1[which(df$name=="A")]
>> col1[[2]]=df$col1[which(df$name=="B")]
I'm not sure I fully understand your problem, you example would not run for
me.

You could get a small speedup by omitting which(), you can subset by a
logical vector also which give a small speedup.
> n <- 2700000
> foo <- data.frame(+ 	one = sample(c(0,1), n, rep = T),
+ 	two = factor(c(rep("A", n/2 ),rep("B", n/2 )))
+ 	)> system.time(out <- which(foo$two=="A"))   user  system elapsed
  0.566   0.146   0.761> system.time(out <- foo$two=="A")   user  system elapsed
  0.429   0.075   0.588

You might also find use for unstack(), though I didn't see a
speedup.> system.time(out <- unstack(foo))   user  system elapsed
  1.068   0.697   2.004

HTH

Peter
> My problem is that the command:  *** which(df$name=="A") ***
> takes about 1 second because df is so big.
>
> I was thinking that a "level" could maybe be accessed instantly
but I am not
> sure about how to do it.
>
> I would be very grateful for any advice that would allow me to speed this
up.
>
> Best wishes,
>
> Emmanuel

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Aug 2008 - which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

[R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

[R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

Possibly Parallel Threads